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Transactions Papers 

Use of Residual Redundancy in the 
Design of Joint Source/Channel Coders 


Khalid Sayood, Member , IEEE, and Jay C. Borkenhagen, Member, IEEE 


Abstract — The need to transmit large amounts of data over 
a band-limited channel has led to the development of various 
data compression schemes. Many of these schemes function by 
attempting to remove redundancy from the data stream. An 
unwanted side-effect of this approach is to make the information 
transfer process more vulnerable to channel noise. Efforts at pro- 
tecting against errors involve the reinsertion of redundancy and 
an increase in bandwidth requirements. We present a technique 
for providing error protection without the additional overhead 
required for channel coding. We start from the premise that, 
during source coder design, for the sake of simplicity or due 
to imperfect knowledge, assumptions have to be made about 
the source which are often incorrect. This results in residual 
redundancy at the output of the source coder. The residual 
redundancy can then be used to provide error protection in much 
the same way as the insertion of redundancy in convolutional 
coding provides error protection. In this paper we develop an 
approach for utilizing this redundancy. To show the validity of 
this approach, we apply it to image coding using DPCM, and 
obtain substantial performance gains, both in terms of objective 
as well as subjective measures. 


I. Introduction 

T HE utilization of redundancy removal techniques is mo- 
tivated by the need for more efficient channel utilization 
for data transmission and reduced memory requirements for 
data storage. The source coder removes redundancy from the 
data, and this redundancy is later reinserted at the receiver. 
The design of the source coder is, in general, based solely 
on the source statistics. The removal of redundancy makes 
the transmitted data especially vulnerable to channel noise. To 
combat the effect of channel noise, channel coding is used, 
which entails the controlled insertion of redundancy into the 
transmitted data [1]. The channel coding procedures are de- 
signed without any reference to the source characteristics. This 
approach is justified in some sense by an important result of 
Shannon [2] which shows that the source and channel coding 
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operations can be separated without any loss of optimality. 
However, in Shannon’s work there is no constraint on the 
complexity of the coders involved. In practical systems, where 
there are limits on complexity, this separation may not be 
possible [3]. 

Shannon showed that for systems transmitting at a rate 
below the capacity of the channel, essentially error-free trans- 
mission can be attained. If the link between the source coder 
and decoder is error-free, there is no need for the effect 
of errors to be taken into account in the design of the 
source coder/decoder pair since there are no errors. However, 
most communication links are not error-free, and the effect 
of these errors has to be studied and if possible mitigated. 
Modestino, Daut, and Vickers [4] in their study of transform 
coding of images for transmission over noisy channels have 
shown that for moderate input signal-to-noise ratios, the use 
of a greater average number of bits per coefficient actually 
degrades the performance of the system, and reduces the output 
signal-to-noise ratio. To combat this effect, they examine 
tradeoffs between allocating bits for source or channel coding. 
Comstock and Gibson [5] extend this work and provide an 
explicit mechanism for allocating bits between the source 
coder and a Hamming channel coder. A similar situation is 
evident for DPCM systems. As can be seen from Fig. 1, 
while the higher rate DPCM system performs better under 
relatively noiseless conditions, its performance drops below 
that of the lower rate system under noisier conditions. Thus, 
under very noisy channel conditions, the DPCM system that 
provides the lowest source coding distortion turns out to have 
the highest overall distortion. Chang and Donaldson [6] present 
an analysis of the DPCM system operating under noisy channel 
conditions. Examining the case where no separate channel 
coding is being used, they demonstrate the usefulness of 
incorporating both channel and source statistics in the design 
of the source coder. 

It seems clear from the above that there is a need to consider 
the effect of channel errors when designing source coders. An 
early effort in this direction was the difference detection and 
correction scheme of Steele, Goodman, and McGonegal [7], 
[8] for broadcast-quality speech. In this scheme the receiver 
infers an error has occurred whenever an individual sample- 
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difficult to analyze unless several simplifying assumptions are 
made about the quantizer input and the quantization noise. 
This is usually the strategy followed, with the quantizer being 
viewed as a source of additive white noise [20]. (It should be 
noted that this assumption is incorrect ([21] -[23]), however, 
its use simplifies the analysis which in turn provides useful 
insight into the workings of the overall system). Second, while 
the predictor design is based on a model for the source, this 
model might not be (and usually is not) matched to the source. 
This results in some residual correlation in the DPCM output, 
especially at low bit rates. However, most analyses which take 
into account the source model are based on the assumption that 
the model accurately represents the source. 

In this section we study DPCM systems designed under 
the assumption that the source is an autoregressive process 
of unknown order. The predictor design is based on this 
assumption of an autoregressive source. However, as the order 
of the source process is unknown, the predictor may still be 
mismatched to the source. By mismatch we mean that the 
order of the process and hence the predictor coefficients are 
different from the autoregressive coefficients of the source. 

First we will develop some results for the DPCM system 
without the quantizer in the feedback loop. The motivation 
for this is the same as the fine quantization assumption used 
by McDonald [25] and others; namely, simplicity. Without 
the quantizer in the loop, the DPCM encoder becomes a 
simple linear filter and consequently easy to analyze. Later, the 
quantizer will be introduced into the feedback loop, and using 
the results derived for the quantizerless system, the statistical 
structure of the encoder output will be studied. 

The source is assumed to be an autoregressive process of 
order A/, generated according to the difference equation 
M 

x(n) = ^ a x x(n - i) + e(n) (1) 

i=l 

where c n is a zero-mean white sequence with variance of. 
The following relationships can easily be obtained for the 
quantizerless system. If in the design of the DPCM system 
the source is assumed to be an autoregressive process of order 
N, the predictor output is given by 

N 

x(n | n — 1) = ^ b l x(n — i). (2) 

i=i 

The prediction error e(n) is given by 

e(n) = x(n) - x(n | n - 1). (3) 


Then, mixing operator and time domain notation, 

x{n) = Hie(n). (6) 

Similarly, defining # 2 ( 2 ) as 

N 

H 2 (z) = 


e(n) = H 2 x(n). (7) 

Substituting the expression for x(n) from (6) 

e(n) = H 2 H 1 t(n) (8) 

where 

H 2 {z)H x {z) = iX^lh L l (9) 

1 “ £i=i 

Therefore, 

M N 

e(n) = ^ die(n - i) — ^ b t e(n - i) -f e(n) (10) 

i=i t=i 

and e(n) is an ARMA (A/, N) process. Notice that if M = N 
and a t = b x for all i , then 

H 2 (z)H 1 (z) = 1 (11) 


and again 

e(n) = e(n). 

If the quantizer is now introduced into the loop, the pre- 
diction, and hence the prediction error, becomes contaminated 
by the quantization noise. For this situation, we obtain the 
following analogs of (2) and (3). The predictor output is given 
by 

M 

£(n|n — 1) = b t x(n — i) (12) 

i=l 

and the contaminated prediction error e(n) is given by 

e(n) = x(n) - x(n | n - 1). (13) 

Using an additive noise model for the quantizer, we can write 
the output of the quantizer e(n) as 

e(n) = e(n) -f q{n) (14) 


If N is equal to A/, and b t is equal to a x for all i , then, of where q(n) is the quantization noise. The output of the DPCM 
course, receiver (and the input to the predictor) i(n) is given by 


e(n) = e(n). 


(4) 


£(n) = x(n|n — 1) + e(n). 


We have assumed our source to be an autoregressive process 
of order M. Such a process can be viewed as the output of a 
linear filter driven by white noise. Let H\{z) be the z-domain 
transfer function of a discrete time linear filter such that 


H x (z) 


i - £i=i a * 2 


(5) 


Substituting the expression for e{n) from (14) 
i(n) = x(n|n - 1) + e(n) + q(n). 
Then using (13) 

i(n) = x(n) + q(n) 


(15) 


(16) 


(17) 


J 
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As a first step towards this objective, we use the definition 
of conditional probability to write L as 

.P[$t = — 1 — ttmi 0 { — 

1 1 (25) 

This can be rewritten as (26) shown below. In (26), 
P[$i-i — a m ] appears both in the numerator and the 
denominator and can therefore be canceled. Also, assuming 
a memoryless channel, and using conditional independence, 


Lv — 


P 0 x —\ — 


P\e t = a n \6i = aj,6i-i = a m ] =p[d l = a n \ 0, = q,] . 


L thus becomes 
L = 


p 

0% — \0i — Oj 

J5 

ii 

05 

i— 1 — O m ] 

p 

0« - Qn 

J $i— i — a m 



(27) 


■ (28) 


The numerator of L is now in terms of the transition 
probabilities of the source encoder and the channel transition 
probabilities. To obtain the denominator of L in the same form 
we make the following claim. 

Claim : 


p\b, - a n | 0i_ 1 = Qm] 

= S{ P [^ = Q "I^ = a ‘] p l d ' = a (|0.-i =Om)}. (29) 


Proof of Claim: Let 

= = a„|0i =a t P[e t = Q, |^-i =«j) 


(30) 


-E{ P r*- Q "l^- a 'J P\6i-\ = am] J 

' ( 31 ) 


P[0,- 1 = Qm] 

• 52{pp i = a„|tf i = a«]P[fl i _i = a m A = a,]}. (32) 

l 

Assuming a memoryless channel, 

p[$i = a n |0, = qi] = P 0i = q„ \0 t = ai,6i- 1 = Qm]- 

(33) 


Therefore, 


D = 


P[6 t -i = a m ] Y 

{p[ 0;_ 1 = a m ,0 t = ai\P[0 t = a n 1 0 X - ai,6 t - i = a m ] } 

p\o I — a n , 0i- 1 = Q?m » = 

P[0t-i =a m ]Y L 

P “ O n , 1 — *-*mj 


F[^_! = am] 


(34) 


P 

Oi — a n j 0^ — aj 

P[0, = a j | Oi- 1 = Qm] 

0, - a n \0, = a i 

,]p[0, = 0,10,-! = a m ]} 
J A 


which by the definition of conditional probability proves the 
claim. 

Thus, the L(j,m,n) metric can finally be written as 


L - 


(35) 

(alternate derivations can be found in [27], [28]). 

If we examine the expression for L we notice that it is made 
up of two sets of conditional probabilities 

P 0, = Qj \9i = Q„], and 

P[6, = Qj 1 0,-1 = Qm], j-m = 1, - •• ,N (36) 


where N is the size of the source coder output alphabet. Thus, 
to obtain L we need to estimate 2 A 2 quantities. For most data 
compression systems this number is usually quite small. The 
estimation of these quantities was clearly not a large burden 
on the two-, three-, and four-bit DPCM systems we studied 
where 2 N 2 was 32, 128, and 512, respectively. Notice that 
the first set of conditional probabilities depends only on the 
channel statistics, and the second set of transition probabilities 
depends only on the source coder and source statistics. If we 
assume a binary symmetric model for our channel, then given 
the binary code used over the channel and an estimate of the 
channel probability of error, it is a simple task to obtain the 
first set of transition probabilities. To obtain the second set of 
transition probabilities, we found it most convenient to use a 
training sequence. The use of a training sequence raises the 
question of how robust this approach will be if the training 
sequence is not exactly similar to the test sequence. Our results 
in the next section show that the approach is reasonably robust 
in terms of differences between training and test sequences. 


IV. Results 

We simulated the proposed approach using differential PCM 
(DPCM) as our test system. It should be emphasized that 


p 

Oi — a n J Oi — aj , — i — &m 

II 

■ aj 1 9 x -\ — a m ]P[Oi— i — o: m ] 

p 

= Q n | 

= am 

P[6i-l = Qm] 


(26) 


3. 




844 


IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 39, NO. 6, JUNE 1991 



„ Probability of Error 

Fig. 7. Two- and three-bit proposed system performance in the presence of 
channel noise, 


dictor coefficient is 0.778625. Notice the large improvement 
in the noisy channel performance over the previous figure. 
While the performance improvement with the JSCD is still 
on the order of 3 dB, the absolute value of the signal-to- 
noise ratio is such as to make the system feasible at relatively 
high probabilities of error. Figs. 5 and 6 repeat the results of 
the previous figure except the two-bit quantizer is replaced 
by three and four-bit quantizers, respectively. Notice that the 
performance improvements are in the range of 6-8 dB. The 
results are especially striking in the four-bit case where the 
use of the joint source/channel decoder improves the signal- 
to-noise ratio of the RDPCM system by about 8 dB at high 
probability of error. The improvement of the RDPCM system 
with the JSCD over the DPCM system is about 16 dB! 

Because of the improvement in noisy channel performance 
obtained by using the Chang and Donaldson predictor, we 
have incorporated that into our design, and from now on when 
we mention the proposed system we mean a DPCM system 
with a 1 tap Chang and Donaldson predictor and the JSCD 
in front of the source decoder. The performance of the two- 
and three-bit proposed system is shown in Fig. 7. In some 
ways this is the most interesting of all the graphs. Notice 
that contrary to the classical DPCM system behavior shown 
in Fig. 1, the three-bit DPCM+JSCD consistently outperforms 
the two-bit system. Thus, the paradoxical situation of getting 
poorer performance for a higher rate no longer exists. Also the 
performance curves remain relatively flat, which is in marked 
contrast to the performance shown in Fig. 1. This indicates 
a more graceful degradation in performance as the channel 
becomes noisier. 

While the objective performance of the proposed system 
(in terms of signal-to-noise ratio) is impressive, the final 
arbiter for any image coding scheme has to be the human 
eye. Figs. 8-10 present images coded using DPCM and the 
proposed system and transmitted over channels with different 
probability of error. Fig. 8 contains the test results for a two- 
bit system, while Figs. 9 and 10 contain results for three- 
and four-bit systems, respectively. In each case the image 
labeled “(a)” is DPCM coded with a channel probability of 
error of 0.02, while the image labeled “(b)” is coded with the 
proposed system under the same channel condition. The image 
labeled “(c)” is DPCM coded with a channel probability of 




(a) 


(b) 




(c) (d) 

Fig. 8. TJ.S.C. GIRL image coded at two bits per pixel (a) using DPCM with 
channel error probability of 0.02, (b) using the proposed system with channel 
error probability of 0.02, (c) using DPCM with channel error probability of 
0.1, (d) using the proposed system with channel error probability of 0.1. 




00 


(b) 




(c) 


(d> 


Fig. 9. U.S.C. GIRL image coded at three bits per pixel (a) using DPCM 
with channel error probability of 0.02, (b) using the proposed system with 
channel error probability of 0.02, (c) using DPCM with channel error 
probability of 0.1, (d) using the proposed system with channel error probability 
of 0.1. 


error of 0.10, while the image labeled “(d)” is coded with the 
proposed system, with a channel probability of error of 0.10. 
The improvement in the perceptual quality of the images is 
truly striking, especially in the case of the channel with 0.10 
probability of error. In this case the DPCM coded image is 
reduced to a nearly indistinguishable blur, while the image 
coded with the proposed system is reasonably clean. This is 
especially encouraging as it means that even for such high 
error rates the proposed system makes the channel usable for 
image transmission. 

Finally, we examine the issue of performance indicators 
which could be used by the system designer to get an idea 
about the performance improvements available through the 
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1. introduction 

One of .Shannon s ninny fundamental contributions kjj his r tl } ^ 
thcl source coding Ami channel coding enn be treated separately with- 
out any loss of performance for l he overall system [l). The basic design 
procedure is to select a source encoder which changes the source se- 
quence into a series of independent, equally likely binary digits followed 
by a channel encoder which Accepts binary digits and puls them into 
a form suitable for reliable transmission over the channel. However 
the separation argument no longer holds if cither of the following two 
situations occur: 

i. The input to the source decoder is different from the 
output of the source encoder, which happens when the 
link between the source encoder and source decoder is 
no longer error free, or 

u. when the source encoder output contains redundancy. 

Of course, case (i) occurs when the channel coder does not achieve 
zero error probability and case (ii) occurs when the source encoder is 
suboptmial. These two situations are common occurrences in practical 
systems where source or channel models are imperfectly known, com- 
plexity is a serious issue, or significant delay is not tolerable. Various 
approaches have been developed for such situations. They are usually 
grouped under the general heading of joint source/channel coding. 

Most of the various joint source channel coding approaches can 
be classified in two main categories; (A) approaches which entail the 
modification ofthe source codcr/dtcoder structure to reduce the effect 
cf channel errors, and (B) approaches which examine the distribution of 
bits between the source and channel coders. The first set ofapproaches 
can be divided still further into two classes. One class of approaches 
examines the modification of the overall structure, while the other deals 
with the modification of the decoding procedure to take advantage 0 f 
the redundancy in the output of the source coder. 

To the first class belongs the work of Dunham k Orav j’j) who 
proved the existence of joint source channel trellis coding svslems for 
certain fidelity criteria, and a design of a joint source channel trellis 
co er presented by A)«inoglu and Gray [3], where the design procedure 

It,* i gf "' ,alitrtl L !°. v ' 1 "isorilUin. Fnrllirr, M *.«<•>■ [,'] ami Anrl.rln 
[5] 5 helved llinl for (lijloilionlrss liniisniissioit ofllir smirrr ii«iny |; nrAr 
joinl Jollier rl.niint] rncoclrrs, rqiiivnlml |>r rforniAiirc ran l>c olilainrd 
wUI, a sigiiifif.aiil rcclurtioii in complexity. Clinug ami Doii.vMsoii [lj| 
propose modir, cation, to tin- DPCM ,y,tc,„ to reduce ll.« etTeel of 
channel errors while Knrlenharh and Winlx [7] and Farvardia and 
Va.shampayan [8] study the problem of optimum cp.autiter design for 

npfpt "”'| lt J ' < f 00tlmD " ft "‘ l Su "‘"-'S (9.10) propose an embedded 
DPC.d system which consists of a two bit DPCM and a two bit PCM 
system in parallel. 

' nl ’ ,C strond c!,lS5 of category A, w c include the work of Reiningrr 
and Gibson [Uj. who use the fact that coeflicients in neighboring blocks : 
in a transform coding scheme will not vary gically, and thus use coclm : 
eienls from neighboring blocks to coneel a possible error, and the work I 
of Steele. Goodman and McGoncgal [12. J3). who propose a difTcrence 1 
e ec ion and correction scheme for biorulcast quality speech. In this 
scheme the receiver infers an ertor whenever an individual sample to 
samp e d-ITeience is greater than the mean squared dif Terrace of a 71 
samp e sliding block. When an error is detected, the received sample 
is replaced by the output a smoothing circuit, h'gan end Steele [Ml 
use a similar method for ireov„i„g f, 0l „ rI(orJ ft „ ; iransmis- 
5.0,1 system. Sayocd and Bor kt nliagcn |15,JC] use the redundancy at 
the some- rodcr output to perform sequence estimation. 
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The work of Modcslino, Daul and Vickers (17) belongs to category 
B. In their study of transform coding they examine tradeoffs between 
allocating bits for source and channel coding. Comstock and Gibson 
[38) extend this work and provide an explicit mechanism for allocating 
bits between a source coder and a Hamming channel coder. Addi- 
tionally, Moore and Gibson (] 9] study the allocation of bits between a 
DPCM coder and self orthogonal convolutional coding. 

In this paper we present a maximum tposlerioii probability (MAP) 
approach to joinl source/channel coder design, which belongs to cate- 
gory A, and hence we explore a technique for designing joint source/ 
channel coders, rather than ways of distributing bits between soutcr 
coders and channel coders. W e assume that the two nonideal situations 
referred to earlier ere present. Our approach is as follows. For a 
nonideal source coder, we use MAP arguments to design a decoder 
which lakes advantage of redundancy in the source coder output to 
perform error correction. Once the decoder is obtained, we analyte 
it with the purpose of obtaining “desirable properties" of the channel 
input sequence for improving overall system performance. Wc then 
propose an encoder design which incorporates these properties. 

2. THE MAP DESIGN CRITERION 

For a discrete mrmoryless channel (DMC), let the channel input 

alphabet be denoted by A = {n 0t «i nml the clmum-l 

input and output sequences by )' = {j*>, i/|, . . . , y,,^} mid V* - 
{yo.yi.---.yL-l}, respectively. If A = {A,-} is the set of srqnrnc r* 
Ai = {o,-,©* Q;,i , . . • , j }, or t *eA, then the optimum receiver (in 
the sense of maximizing the probability of making a correct decision) 
maximizes P[C], where 

P[C] = £ F[C|}'-]F[y]. 

A i 

This in turn implies that the optimum receiver maximizes p[C)f). 
When the receiver selects the output to be Ak, then P[C|f] = P()’ = 
AilT). Thus, the optimum receiver selects the sequence Ak such that 

P[Y = Ak\Y}> P[Y = A,- |f] Vi. 

Lemma 1 

Let y,- be the input to a DMC. Given y^j.y,- is conditionally inde- 
pendent of > J. If y 0 = y 0 then the optimum receiver selects a 

sequence A k to maximize nf_-j , p(yi|y,-_ j , y,). 

Proof: 

From the preceding result, the receiver tries to maximize «P()*jf). 
Using the chain rule wc can write this a j 


P[Y\Y) = -P(yo, yii ■ ■ yL-ilSo, yi, . . 

= P[yL-\\vL-’i i yi-s, • . . , yo, yo i/L-i) 

^(yi-slya-a, • • - ,yo, yo» • • *, yi-i) • - ■ ^(yolvo. • • •. yi-i) 

The last factor on the right hand side (RHS) is equal to one. Using 
the assumption of the DMC, w’e obtain 

F(y|y') = nf = - , p(y.lv.-,.y,). (i) 


□ 

The lemma addresses the situation in case (ii), i.c., the situation 
in which the source coder output (which is also the channel input 
sequence) contains redundancy. Using this lemma, w C ran design a 
decoder which wi|j uVc ntlvimtagr of ilrpcmlrtn'r in thr rlmntu l input 
sequence. 




3. DECODER DESIGN 

The lemmn of the previous section provides the mathematical slruc- 
lure for the decoder. Tlic physical structure con be easily obtained 
by examining the quantity to be maximized. The decoder maximizes 
or equivalently !ogP(Y|)'), but 

(2) 

end various solutions exist for the maximisation of additive path met- 
rics. To implement this decoder we need to be able to compute the 
path metric. This task is considerably eajed by the following lemma. 
Lemma 2 Let A be the channel input alphabet and {y,} and {y,} be 
the input and output sequences of a DMC. Then 


log P(liV) = X>g/ , (y;|y.,y;-i) 


( 3 ) 


log 


P[vi = Qrjyi = Q;]^(yi = fljlvo = ftp] 
Er ^[yi = °i|yo = °o)R[yi = <3n|yi = C;J 


. , *P[p3 = Co! y: - c 0 )-P[va = o 0 |yj = aj] 

T lOg = — 


- log 


- log 


’ Ei - »ilyi = = c o|yx = <*,] 

R[yi - Qnlyi ~ &o]R[yi = Qolyo ~ cq) 

Ei ^Ivi = °ilyo = °o]^[yi = ojyi = ^i] 

R[ya = ^o!y: = Qcl^lyo = Qciyi = fi o) 

Er ^ti'3 = «i|yi = ^ol-Plyj = n 0 |y 3 = ci] 


( 4 ) 


Defining 

dij = Hamming distance between o,- end cy 

= -P(y. = o/|y.-t = <h) 

then using (5) and simplifying (4), an error occurs if 


(M 


- r/ (l0 ) log( “ ’ ) + log + log ^ - log 9 ~ > 0 {*) 

i - v yon yoo E»ii r ) ,/l0 


Defining o = (6) ran be rewritten as 
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( 8 ) 


R(y. = Cj|y.*-i = °m, y; = <»«] = 

P[ y, = = «j)^(y; = °jlyi-t =/*.«) 

El R|y» = «i|y;-j = 0 m jR(y 1 = n,»|y; = or] 

Proof: See 1 1 1*] . 

The expression <»n the 11 US of (3), while it looks more r«>mplir:ih*d, 
is artnally n much more tractable form of the. desired eumliti.in.il prob- 
ability. Note that tins expression is a function of two dislin. L m Is of 
transition probabilities, the channel transition probabilities and the 
source coder output transition probabilities. As the channel transition 
probabilities depend only on the channel, and the source roller output 
transition probabilities depend only on the source coder and source 
probabilities, the two sets of transition probabilities can be estimated 
independently. The two can then be combined according to (3) to con- 
struct a M x M x M lookup table for use in decoding. If the source 
coder or source changes, the only parameters to be modified are the 
source coder output transition probabilities. Results using a DPCM 
source coder with mi image as the source are presented In Section 6. 

4. DECODER ANALYSIS 

In the previous section we developed a scheme for providing error 
correction using the redundancy in Lhe channel input sequence, or the 
source coder output. We looked at the design of the decoder given 
a source coder or channel input sequence with some rather generaJ 
statistical properties. In this section we examine the reverse problem. 
That is, given the decoder obtained in the previous section we look 
for "desired properties” of the channel input sequence and, hence, the 
source coder. 

To obtain the desired properties w‘e need to examine the factors 
involved in the error correcting capability of the decoder. Toward 
this end let us examine the following situation. Referring to Figure J, 
assume that the correct sequence of transmitted codewords is noCo n o- 
An error occurs if the path metric for aoO;Co is greater than the path 
metric of ooocco* Assume pi = a n and yj = co- An error occurs if the . 
following quantity is positive. i 


iTlie left hand side is maximized when j = n(i x= 0). Titus an error 
occurs when the number of bit errors, which iti tins rnsr is i/„ 0t is 
greater than the quantity on Lhe LHS of (S) or 


«Ao > 


1 


(lo K ^ 

\ 9j« 


.. yoo . , Ei a ~ Ami *Jh 

4 lt>K 1 Io K Y| , ] ■ 


(*) 


logo V ° 9j* ° 9cj ° E 

The alternative pnth shown in I'lgurr I is mil/ our of possible paths. 
Another longer alternative path is shown in Figure 

In this ease the nnmbrr of errors required to lake the alternative path 
is given by 


rf„o > ~ — ( log + log 

logo \ 


yo u 

' y., 


yoo 

yo». 


) 


( 10 ) 


Notice that the number of errors required to take a longer incorrect 
path (a path with more branches) is larger limn for a shorter incorrect 
path. To mnke our statements more roneirlr, we define a parameter 
we call the error correction capability / ns 

/= j - ^(V'n|n,. l )/log;V = 


1 - 1^77 E p(y.. = «'. y- - 1 = ) '°K 


-P(y.. = = «k) 


= 1 - i~M E Pty- = “' • y- - > = ) le « ;;; 




(H) 


where M is the size of the channel input Alphabet. We immediately 
i note the following properties of / 

(i) / is a convex cup function of the conditional probabilities 

{F(yn|y«-i)}- 

(ii) 0 < I < 1. 

Further properties of / arc developed in the following Irmnias. 
Lemma 3: If I is zero for a particular chnmirl input sequence, the 
decoder will not coned any errors. 

Proof: 

I is zero when 


^p(y-wy*-i) log — = log// 


9ik 

This is true when = log ^ for all In lhi< condiliou llie right 
hand side of (9) is zero giving the desired result. 

O 

Lemma 4: If / is one for n particular input sequence, the decoder 
obtains the correct sequence with probability one. 

Proof: 

For I to be one, U (y,, |y.»- \ ) has to be zero. This is true if f*u each 
k 0 there exists nn such that 


_ / 1, t = l o 

" { 0, 1*1,,. 

This in turn Implies that there exists some i q such that 


Thus 


**>-{:: \*z. 




an d llic decoder will pick the correct sequence with probability one. 

O 

Tlic above two lemmas provide a relationship between the value of 
/ and the error correcting capability of the decoder, for the extreme 
values of /. To obtain an insight into the relationship for other values 
of f we look at a simplified version of (9). Assume that the site of the 
channel input alphabet is two, then (9) simplifies to 


djo > 
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001 + cr 

goo + a 



( 12 ) 


Noting that goo = 1 “ 0io. 0oi = 1 - 0u, ond for the right hand side 
to be positive poo > \ tnc * Pol < \ wc can rewrite (12) as 


dio > 


1 

logo 


^°g 


goo 

Pio 


+ log 


l + cr*" 
TTor 7 ^ 



(13) 


In (13) the larger the right hand side the greater is the error correcting 
capability of the receiver. The right hand side can be increased by 
decreasing pci below I. Tims the error correcting capability increases 
a_s q ox decreases below A. ]f we examine / wc find that J increases as 
y 0 , decreases below J. This is because 


= po (( f oo)log £ + „o log A) 

+ V\ (fet log + Pi (1 - ffci) log TTfT?) (n) 

decreases with poi decreasing below Thus for this simple example, 
an increase in I means an increase in the error correcting capability of 
the decoder. 

5. ENCODER DESIGN 

In the previous section we obtained desirable properties for the 
channel in pu t/encodcr output sequence. In this section we examine 
ways of incorporating these desireable properties into the encoder. We 
wish to do this without decreasing the redundancy removal capability 
of the source coder, r.nd if possible, without increasing the transmitted 
bit rale. To see how to approach this problem let ns first examine the 
source coder for noiseless channels in some detail. 

In general, a source coder consists of two operations, data com- 
pression and data compaction (20). The data compression operation; 
usually consists of redundancy removal and involves some loss of infor- 
mation. Examples of data compression schemes are DPCM, transform 
coding and vector quantisation. The data compaction schemes ere 
information preserving. They may result in a variable rate out. Ex- 
amples include Huffman coding and lunlenglh coding. Generally in 
discussions of joint source/channel coder design, the data compaction 
operations ere not included. The reason for this is that due to the vari- 
able rale output, L h c data compaction schemes are Highly vulnerable 
to channel noise and, therefore, arc not considered for noisy channel 
applications. * 

A possible way of achieving our objectives is to insert another op- 
eration between the data compression end data compaction steps as 
shown in Figure 3. 

To satisfy our objectives, the II operator should have the following 
proper ties. 

(a) The n operator should perform distortionless encoding. 

(b) The H operator should increase the error correcting capa- 
bility. 

(r) The fl operator should not increase the bit rate. For the 
ease where the data compaction scheme is a 1 1 it fl'm nn coder, 
tins is equivalent lo the condition that the onptut entropy 
not hr greater than the Input entropy. 

An example of the II oprrntor which satisfies (a) and (h) nm! which 
can be modified lo satisfy (c) functions as follows. Let the input to II 
be selected from the alphabet 

A = {coi fit °//-l }, 

and let the output alphabet be denoted by 

5 = {jo.Ji,..., ■*//». i). 


Then the inpul/output mapping is given by 

x* = rt,,x M _i = nj =a y„ = (13) 

The effect of the D operator is lo increase the distance between 
Alternative sequences. To see this, Jet ns construct a simple example. 
Let A = {cio.rij} $ = { J o, -M , . J .i ) then 


y.. 



if 

7 „ r- n 0 

and 

^ H - 



y.. 

= 

•n 

if 

j n r a, 

mid 

x.. . 

- *i„, 

y- 

= 

jj 

if 

r„ rr /iq 

and 

x., . , 


y.. 

= 

*3 

if 

ir„ - a, 

and 

f n • 1 

r. - 1 ( . 


In this case if y„ = «o. !/» + i cannot be .i* -*3 because j/.. “ •-> 

means x„ = n 0l and y„-M = or • 1 a m"** 11 * • r » - " i • Thus n *!•-«-. I 

‘sequence cannot liavc 4; or s 3 following s 0 . 

For simplicity let us ignore the Huffman renter and assign lixed 
length codewords to the r; as 

* n : 00, .i, : 01, : Hi, s* : I I 

Now suppose the transmitted sequence was tin* all zero sequence, 
llie metric, used was the Hamming distance, and the received sequence 
is 00001000000000; that is, there is an error in the fifth bit. If the 
receiver decoded the first four bits as sc-o then it cannot decode the 
fifth and sixth bits as s 7 for the reason noted above. The only two 
• options are decoding them as s 0 or Jj. If we decoded them as j 0 * we 
could continue decoding the rest of the sequence as j 0 j 0 ..., and the 
Hamming distance between the received and decoded sequence would 
be one. If we decoded them as ji, we would have to decode the next 
set of two bits as sj or j 3 because so cannot follow Decoding as 
Si gives the smallest Hamming distance so we decode the seventh and 
eighth bit as s 7 . This gives a total Hamming distance of two for the 
incorrect path. Thus the receiver will select the correct path {the path 
with the smallest Hamming distance). 

0. SIMULATION RESULTS 

We present the results of simulating two different systems in this 
. section. The first set of results were obtained using a nonideal source 
. coder with the decoder proposed in Section 3. The second set of results 
pertain lo the system proposed in the previous section. In both casts 
the data compression scheme is a DPCM system will; a fixed one tap 
predictor ond a nonuniform Lloyd-Max quantiser. 

The source for the first set of results is the USGGIRL image. 
The source coder output transition probabilities were obtained using 
a training set. The training image was the USC COUPLE image. 
The performance measure was the Peak-signa!-to-noise-ratio (PSNR) 
defined ts 

PSN R(dB) = JOIog.o ( g^TTITji ) 

where x; is l he Input to the source coder while x,- is l he output of l he 
source decoder. Figure A shows the performance comparison for a two 
bit per pixel system. Most of the performance improvement is available 
at high probabilities of error. At these probabilities of error, however, 
the improvement is substantial. Figure 5 shows the same kind of results 
for a four bit per pixel system. The performance improvement for this 
case are even more substantial than those for the two-bit system. Two 
tilings nre especially noteworthy in these results. The first one is that 
Die performance improvement docs not milly become significant, until 
the channel is very noisy. The other is Hint the performance ritrvr 
in the hi fch noise region is relatively flat. This means Dint even very 
noisy channels may hr. usable fnr image transmission. Further results 
including perceptual results can be found in |jl»|. 

The second set of results were obtained using the approach pro- 
posed in .Section 5. The source encoder was replaced by the proposed 
joint sou rce/channel coder. The 11 operator usrd is the one described 
in the previous section. The source again was the USC’GIHL image, 
and end of line synchronisation was assumed. The performance •■»>m- 
parison is shown in Figure 6. Note that unlike the previous r.iv. Die 
performance improvement occurs at both low and high error probabib 
itics. This makes Die scheme especially useful for I innsiuissiou at low 
error rales. 
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CONCLUSIONS 

1m this pn t>r r «f have presented a MAP Approach lo joint source/ 
rlcmtirl coder design. The nppronrli is bused in |»irt on the Diet that 
i r r r coders nn*. in grurinl, nnmdcnl aikI, Ihrieforr, rnnimt remove 
nil nulumlnney Loin n sourrr. This nonidonlily is Inkm mlvniilhgc 
„r, I> v n MAT Jrrnder, lu correct errors. The decoder Is anh1.y7.im 1 to 
nl.Uin desired properties for the encoder output scupirntr. A joint 
source/ehannel erreoJer design approach is presented which ineorpo* 
riilcs thr desired properties, And examples fire given wlrich show Hint 
considerable performance improvements can be obtained with the pro- 
posed approach. 
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ABSTRACT 

One of Shannon’s many fundamental contributions was his result 
that source coding and channel coding can be implemented 
separately without any loss of optimality. However, the assump- 
tion underlying this result may at times be violated in practice. 
Various joint source/channcl coding approaches have been 
developed for handling such situations. A MAP approach to joint 
source/channcl coding lias been proposed which uses a MAP 
decoder and a modification of the source coder to provide error 
correction. We present various implementation strategies for this 
approach and provide results for an image coding application. 


I. Int roduction 

One of Shannon’s many fundamental contributions was his result 
that source coding and channel coding can be treated separately 
without any loss cf performance as compared to an optimum 
system [1], The basic design procedure implied by Shannon’s 
theorems consists of designing a source encoder which changes the 
source sequence into a series of (approximately) independent, 
equally likely binary digits followed by a channel encoder which 
accepts binary digits and puts them into a form suitable for 
reliable transmission over the channel 9 [2]. One aspect of the 
overall optimum system not addressed by Shannon is any increase 
in system complexity that results front this separation, and Massey 
[3] and Anckeia [4] showed that for distortionless transmission of 
the source under trie constraint of linear source and channel 
coders, a significant reduction in complexity with equivalent 
performance can be achieved by using a linear joint source/ 
channel coder. Their scheme also differs from most data com- 
pression systems in that the bulk of the system complexity is 
transferred to the receiver. 

The theorem that provides the justification for the separate design 
of the source coder and the channel coder, often called the 
Information Transmission Theorem [2], assumes that both the 
source encoder/decoder pair and the channel encoder/decoder pair 
arc operating in an optimal fashion. Specifically, the source 
encoder is assumed to present the channel encoder with a sequence 
suitable for optimal channel coding, and the channel encoder/ 
decoder pair is assumed to reproduce the source encoder output at 
the source decoder input with negligible distortion. Unfortu- 
nately, there are practical situations where these assumptions are 
violated--name!y, when the source encoder output contains 
redundancy, which occurs if the source encoder is suboptimal, and 
when the source decoder input differs from the source encoder 
output, which is a result of channel errors. These two situations 
are common occurrences in practical communication systems 
where source and/or channel models are imperfectly known, 
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complexity is a serious issue, or significant delay is not tolerable. 
Various approaches have been developed to handle these two 
situations. These include approaches in which the source and 
channel coding operations arc truly integrated [3-6], approaches 
that cascade known source coders with known channel coders and 
allocate the fixed bit rate to the source coder and channel coder 
to maximize system performance [7 - 1 5], approaches in which the 
source coder and/or receiver is modified to account for the 
presence of a given noisy channel [16-26], and approaches which 
use some knowledge of the source and source coder properties to 
detect channel errors and compensate for their effects [27-35]. 
The research described in this paper is concerned with the 
implementation of a joint source/channel coder design which was 
an extension of the work presented in [31,32]. This approach 
utilizes structure in the source encoder output by using a MAP 
decoder to correct errors introduced by the channel. 

IT. Previous Work 

Based on the MAP design criteria, a decoder structure was 
proposed in [32] which takes advantage of redundancy in the 
channel input sequence to providc^crror correction. I he decoder 
maximizes the quantity log P(Y \ Y) where 

Y = (>‘v >’z '-’V 

is the channel input sequence while 

y = (y v y 2 >\) 

is the channel output sequence. If a Markov model is imposed on 
the channel input sequence, the path metric can be written as 

log P(Y I Y) = S log P(y x \ q.r,.y (1) 

and 

P(y i = «j O'i-i = a J’\ = 0) 

P(y\ =_ I o ° a \) p (y> i a j I J'i-i ° a _J_ 

q /Wi = « n l)', = =* |y M - <>J 

The proof of the above can be found in [31,32], 

Based on analysis of the decoder a parameter called the error 
correction capability was defined in [35] as 

1=1- H(y n \y^ y )/\ogM (3) 

We noted that a desirable property of a joint source channel coder 
would be to increase /. The approach proposed for this requires 
the modification oLthe source coder. In general, a source coder 
consists of two operations, data compression and data compaction 
[36]. The data compression operation usually consists of redun- 
dancy removal and involves some loss of information. Examples 
of data compression schemes are DPCM, transform coding and 
vector quantization. The data compaction schemes are informa- 
tion preserving. They may result in a variable rate out. Examples 
include Huffman coding and runlength coding. Generally in 




discussions of joint source/chnnnel coder design, the data 
compaction operations are not included. The reason for this is 
that due to the variable rate output, the data compaction schemes 
arc highly vulnerable to channel noise and, therefore, are not 
considered for noisy channel applications. 


where d ■ is the Hamming distance between the binary codewords 
corresponding to a n and a . and is the number of bits in each of 
the codewords. However, when t(a n ) / /(znj, the calculation may 
not be as simple. To see this we need to introduce some more 
notation. Let the codeword corresponding to a n be represented by 


A possible approach to achieving the objective of increasing I is 
to insert an invertible transoperation between the data compres- 
sion and data compaction stages. An example of such an opera- 
tion called the 7r operator, was presented in [35]. The operation 
can be described as follows. Let the input to the tt operator be 
selected from the alphabet 

$ ~ ( s & s v s z 

and let the output alphabet be denoted by 

“ ( a o* a 2 a n 2 -\J- 

Then t he input/output mapping is given by 


This operator and its effects are described in more detail in [35]. 

While this approach achieves the objective of increasing the error 
correcting capability, it also results in a variable rate system. Tor 
this situation the branch metric of the form of (2) becomes 
difficult to implement. We explain these difficulties and propose 
implement:* table approximations of the metrics in the next section. 
Section V contains simulation results which demonstrate the 
viability of these approximations. The use of a variable rate coder 
also con-plicate? tine structure of the decoder. In Section IV we 
present a modified Viterbi decoder which can be used with 
variable rate code:. 

HI. D evelo pment of the Path M etric 

Before we begin our discussion of die path metric for variable rate 
case we need to summarize live derivation of (2). The derivation 
consists of two steps. First we show that 

<J r ,V V 0) 

F -y- * ‘v, ! .' i • a \) p (y\ I .Vj.p 

P( >'\ = «J j'm = a J 

Then we show that the denominator can be written as 

p (y\ - 'V, i 'Vi - a J = s i p (y\ = a n i y\ < 5 > 

= s x )P(y . = rt, |.v M -aj 

Note first that in this derivation the channel input alphabet and 
output alphabet are the same. We have assumed hard decision at 
the output of the channel and for a fixed rate coder this translates 
into identical alphabets at the input and output of the channel. 
For the case where we linvc variable rate codes there is a subtle 
difference. In fact, there arc two different ways in which we can 
view the output of the channel. The first approach is to assume 
that there is a Huffman decoder at the output of the channel. The 
Huffman decoder output alphabet is the same as the joint source/ 
channel (JSC) coder output alphabet. Thus the branch metric as 
derived in (2) can be used directly. However, now the computa- 
tion of the individual factors of the branch metric becomes 
somewhat more involved. Specifically, consider the calculation of 
P(.v i - | .jv = fl.j, where the channel is assumed to be a binary 

symmetric channel with known crossover probability p. Let I( a 
be the number of bits in the binary codeword corresponding to the 
symbol a-. 

If l(Ci r ) - l(a^), ns is the case when a fixed rate code is used, then 
Vg = o n I v. = a.) = p%(l - pf% (6) 


Then, if i(a u ) is less than I(a-) 

(?) 

and the calculation is still relatively straightforward. However, if 
l(c j^) is greater than l(a-), 

l J (y> = «n I y,' = V “ ^ p ()'i = Wl = "l I V = V W 

or in more familiar terms 

P ( y, = "J V = V E l P( ''i = «J>'i (<) ) 

= VVi = V'Vvi = « l ly i = «P 

where we have used the chain rule and the Markov properly of 
JSC coder output. The second factor in the summand is simply 
t lie transition probability of the JSC coder output while the tirst 
factor can be calculated as 


p (h = vJ-v = V'Vi = 

II [u , ) Pr(a r I a. ;il l(a n> Pr(a n | fl, ) 

k=1 n k } k k=L(«v) + l k l k-lU ; ) 


as long 
simply 

p (h = 


as l(aj is less than or equal to l(a-) + l(a y ). If 
repeat the process again to obtain 


fl n I >'i 


+ 1 " a [J ~ Piy \ ~ a r ' y i + 2 

- ap= p (y> = "n ! >'i “ 

= %)!’(>• = ‘"h I - v i + 1 " a 0 


"h I v i 


not we 


( 10 ) 


2 


Again IfciJ should be less than If a .) + i(n { ) + l(a^). 

Obviously tills process can continue if there is a large variation in 
the codeword lengths. Therefore, this approach becomes cumber- 
some for moderately large codcbooks. 


A somewhat different way of looking at this issue, suggested in a 
slightly different context by Massey [37], is to block the channel 
output bit stream into fixed length words where the fixed length 
is longer than the longest binary codeword in the channel input. 
Then, the path metric becomes the logarithm of 

p (y, = '• I VjW.Vj I Vj.p (ii) 

2, P(y i = r I V = = V l >V, 

where /■ denotes the word corresponding to a received block of 
bits. While there are some complications here as well, in the 
interpretation of P(y - | y.J, the main difficulty is a computational 
one. The simplest implementation of the JSC decoder requires 
that the path metrics be stored in a lookup table. In the case of 
identical input and output alphabets of size A/, the lookup table 
size is A/ 3 . However, with this approach, the lookup table size is 
A/ 2 2 l+1 where L is the longest codeword. This exponential 
increase with even moderate codeword lengths makes this 
approach impractical at least for a lookup table implementation. 
An implementation which does not use a lookup table, and instead 
computes the path metric at each step may still be possible with 
special purpose dedicated hardware. 




Given i he difficulties involved with implementation of the exact 
p;uh metric of the MAP JSC decoder, we have proposed two 
approximations which provide a high level ot error protection 
while being computationally simple and easy to implement. First 
consider (4). We approximate the denominator as 

P ( G - r\ r,v = p (y\ = 

and therefore the entire expression as 

P(y, = k I h = = n J (W 

p (V\ = '' I K = Qj) p (yi l I K-i = “J 

P(y. = r) 

where the number of bits in r is the number of bits used to 
represent a-. The denominator is further approximated by 
assuming equally likely reception of bits as 

P(y s = r) - ^-/ ( V (13) 

where 1(a) is the number of bits in a ■ and therefore in r. The 
computation of the path metric then proceeds as follows: the 

conditional probability P( v s = a- | v- - n j.\) ]S read from a 
lookup table and the transition probability is computed by 
assuming a binary symmetric channel with known crossover 
probability. This form of the path metric is easy to implement 
and the simulation results of Section V show the scheme to be 
highly effective. 

An even simpler approximation is to use the Hamming distance 
between the received bits and t he candidate sequence elements as 
the branch and path metric. Of course the candidate sequence 
elements are selected from allowed sequence values. (Recall that 
the rr operator, by construction, disallows certain sequences.) We 
present results using this metric in Section V. This approximation 
causes a drop in performance from about a half dB in the low 
noise region to about 1.5 to 2 dB in the high noise region. Given 
the simplicity >.T implementation for this scheme, this may very 
well be an accept aide cost. 

Once the pal h metric 1ms bee it obtained, (he decoder structure 
need: to be elucidated. We do so in the next section. 

IV. Decode r Structure 

The form of the path metric in (!) is a familiar one and several 
decoder structures exist which maximize (or minimize) additive 
path monies iT this form. One of the most popular ones is the 
Viterbi decoder structure. Recall that the Viterbi decoder limits 
the total number of candidate paths (solutions) to some finite 
number M where A/ is the number of different values a solution 
can take at any given time increment. This is done by using a 
trellis structure that only includes allowed paths or transitions. 
For the problem considered here M would be the size of the 
output alphabet of the ~ operator. In most applications where the 
Viterbi decoder is used, the codewords are of fixed length and 
therefore the candidate paths are of the same length. This is not 
true in the current case. However, this problem can be resolved 
rather simply by associating a pointer with each candidate path. 
The pointer counts the number of bits used to form the path it is 
associated with. 

To see how this works consider the following example. Let the 
input alphabet to the -operator be of size two; S - {s Qr s^}. 
Suppose the input sequence to the tt operator is 

■*0 *o *o *0 5 o 

then the output of the r. operator will be 

rtp a 1 a 2 Gq 


If the Huffman code for the rr operator output is 
a Q . 0, ciy 10, a 2 :\ 10, a y \ \ 1 
then the transmitted binary sequence will be 

0 0 10 1 10 0 

Suppose there is an error in the fourth bit and the received 
sequence is 

00111100 

The decoder operation is shown in Figure 1, where the metric 
being used is the Hamming distance. The branches arc labelled 
with a pair of numbers. The first number is the accumulated 
number of bits used by the path that includes that branch while 
the second number is the Hamming distance between the received 
bits and the candidate solution. The receiver assumes a starting 
value of fl Q . In the first step there are two possibilities, that the 
transmitted word was a p or r/ v If we assume the transmitted word 
was a 0 we use up one bit and the Hamming distance is zero. If a ^ 
is assumed then wc use two bits and the Hamming distance is one. 
Therefore, the lower branch (to n f c ) is labelled 1,0 while the 
branch to is labelled 2,1. This procedure is continued with 
conflicts being resolved by picking the path with the lower 
Hamming distance. The procedure is shown in Figure 1. 

V. Sim ulatio n Result s 

The techniques presented in this paper were applied to an image 
coding scheme. The data compression scheme was a DPCM 
system with a fixed four level nonuniform Max quantizer and a 
one-tap predictor. The data compaction scheme is a sixteen-level 
Huffman coder. The average rate for this system was 2.3 bits per 
pixel. Lnd of line ^synchronization is assumed for the receiver. 
A block diagram of the system is shown in Figure 2. 

The performance with both metrics is shown in figure 3 and 
Figure 4. Both figures plot the same results where Figure 3 
emphasizes the performance in the low noise region and Figure 4 
emphasizes performance at high channel error rates. The curves 
arc labeled "Approx 1," "Approx 2," ami "No Piotection." The 
curve labeled Approx I is tire performance curve for the system 
which uses the metric approximation of (12) and (13). The curve 
labeled Approx 2 is the system which uses the second approxima- 
tion, i.c., tire Hamming distance between the received bits and the 
candidate sequence elements. The curve labeled "No Protection" 
is the system without t-hc joint source/chnnncl coding scheme. 
Both metric approximations provide a high degree of protection 
for low to moderate channel error rates. At high channel error 
rates, while both the systems piovide substantial performance 
improvements over the unprotected system, the system with the 
Hamming distance metric provides lower performance than the 
system with the approximation of (12) and (13). However, as 
mentioned before, this might be a small cost to pay tor the 
simplicity of implementation. 
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Figure 1. Decoding procedure for variable length 
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Abstract 

Source coders and channel coders are generally de- 
signed separately without reference to each other. This 
approach is justified by a famous result of Shannons. 
However, there are many situations in practice in 
which the assumptions upon which this result is based 
are violated. Specifically , we examine the situation 
where there is residual redundancy at the source coder 
output. We have previously shown that this residual 
redundancy can be used to provide error correction us- 
ing a Viierbi decoder. In this paper we present the sec- 
ond half of the design; the design of encoders for this 
situation. We show through simulation results that the 
proposed coders consistently outperform conventional 
source-channel coder pairs with gams of up to 12dB at 
high probability of error. 


1 Introduction 

One of Shannon’s many fundamental contributions 
was his result that source coding and channel coding 
can be treated separately without any loss of perfor- 
mance for the overall system [1]. The basic design pro- 
cedure is to select a source encoder which changes the 
source sequence into iid bits followed by a channel en- 
coder which encodes the bits for reliable transmission 
over the channel. However, the separation argument 
no longer holds if either of the following two situations 
occur: 

i. The input to the source decoder is different from 
the output of the source encoder, which happens 
when the link between the source encoder and 
source decoder is no longer error free, or 
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I On leave from Dept, of Electrical Engr. Texas A&M Univ. 


Jerry D. Gibson* 

Information Systems Laboratory 
and the Telecommunications Program 
Dept, of Electrical Engineering 
Stanford Univ., Stanford, CA 94305 

ii. The source coder output contains redundancy. 

Case (i) occurs when the channel coder does not 
achieve zero error probability and case (ii) occurs 
when the source encoder is suboptimal. These two 
situations are common occurrences in practical sys- 
tems where source or channel models are imperfectly 
known, complexity is a serious issue, or significant de- 
lay is not tolerable. Approaches developed for such 
situations are usually grouped under the general head- 
ing of joint source/channel coding. 

Most joint source channel coding approaches 
can be classified in two main categories; (A) ap- 
proaches which entail the modification of the source 
coder/decoder structure to reduce the effect of chan- 
nel errors [2-10], and (B) approaches which examine 
the distribution of bits between the source and chan- 
nel coders [11, 12]. The first set of approaches can 
be divided still further into two classes. One class of 
approaches examines the modification of the overall 
structure [2-5], while the other deals with the modifi- 
cation of the decoding procedure to take advantage of 
the redundancy in the source coder output [6-10]. 

In this paper we present an approach to joint 
source/channel coder design, which belongs to cate- 
gory A, and hence we explore a technique for design- 
ing joint source/channel coders, rather than ways of 
distributing bits between source coders and channel 
coders. We assume that the two nonideal situations 
referred to earlier are present. For a nonideal source 
coder, we use MAP arguments to design a decoder 
which takes advantage of redundancy in the source 
coder output to perform error correction. We then 
use the decoder structure to infer the encoder design. 


2 The Design Criterion 

For a discrete memoryless channel (DMC), let 
the channel input alphabet be denoted by A = 




{ao, ai, . . . , a^f-i , }, and the channel input and out- 
put sequences by Y — {t/o»yi,-*-, Vl-\) an ^ ^ = 
{yo,yii---»yL-i}i respectively. If A = {A,} is the set 
of sequences Ai = * . • , t , — 1 } » Q i.t^i then 

the optimum receiver (in the sense of maximizing the 
probability of making a correct decision) maximizes 
P[C ], where 

Ai 

This in turn implies that the optimum receiver max- 
imizes P[C\Y). When the receiver selects the output 
to be A kl then P[C\Y] = P[Y = A k \Y}. Thus, the 
optimum receiver selects the sequence A k such that 

P[Y = A k \Y] >P[Y = Ai\Y] Vi. 

Lemma 1 

Let t/i be the input to a DMC. Given y,'-i,yi 
is conditionally independent of y n -k>k > 1. If 
yo = yo then the optimum receiver selects a^ se- 
quence A, to maximize nf“ 1 1 P(yi|y i -i, Vi) where Y k = 
{yt.S/t+i.- • -.yt-i}- 

The lemma addresses the situation in case (ii), i.e., 
the situation in which the source coder output (which 
is also the channel input sequence) contains redun- 
dancy. Using this lemma, we can design a decoder 
which will take advantage of dependence in the chan- 
nel input sequence. The lemma provides the mathe- 
matical structure for the decoder. The physical struc- 
ture can be easily obtained by examining the quantity 
to be maximized. The optimum decoder maximizes 
P{Y\Y) or equivalently logP(Y|Y), but 

logP(y|y) = £logP(y;|V;,y,_i) (U 

which is similar in form to the path metric of a convo- 
lutional decoder. Error correction using convolutional 
codes is made possible by explicitly limiting the pos- 
sible codeword to codeword transitions, based on the 
previous code input and the coder structure. In this 
case, while there is no structure being imposed by the 
encoder, there is sufficient residual structure in the 
source coder output that can be used for error cor- 
rection. This structure can be quantified in light of 
the Lemma. That is, the structure is reflected in the 
conditional probabilities, and can be utilized via the 
path metric in (1) in a decoder similar in structure 
to a convolutional decoder. However, to implement 
this decoder w*e need to be able to compute the path 
metric. Unfortunately the quantity -P(y* y»-i) * s 
difficult to estimate. We have therefore used various 


approximations to this quantity with some success. In 
[8, 9] P(yi|Y;,y,-i) is approximated by P(y;|y», Vi-\) 
with excellent results. Other approximations can be 
found in [13]. 

In [9] we showed that the use of the decoder led 
to dramatic improvements under high error rate con- 
ditions. However at low error rates the performance 
improvement was from nonexistent to minimal. This 
is in contrast to standard error correcting approaches, 
in which the greatest performance improvements are 
at low error rates, with a rapid deterioration in per- 
formance at high error rates. In this work we combine 
the two approaches to develop a joint source channel 
codec which provides protection equal to the standard 
channel encoders at low error rates while providing 
significant error protection at high error rates. 

3 Proposed Encoder Structure 

In the conventional error protection approach we in- 
troduce structure in the transmitted bitstream. In the 
approach proposed in [9], we use the residual structure 
in the (generally nonbinary) source coder output se- 
quence. To combine the two approaches, we need to 
introduce additional structure without disturbing the 
structure already present. Because of the nature of the 
decoding approach, a convolutional encoder would be 
most appropriate for introducing structure. However, 
a standard binary convolutional encoder will tend to 
destroy the structure in the source coder output. To 
preserve the residual structure while introducing ad- 
ditional structure we propose to use nonbinary convo- 
lutional encoders (NCE) whose input alphabet is the 
output alphabet of the source coder. 

Let r n , the input to the NCE, be selected from the 
alphabet A = {0, 1,2, and let y n , the output 

alphabet of the NCE, be selected from the alphabet 
S = {0, 1,2,..., A/ - 1}. Then, two of the proposed 
NCEs can be described by the following mappings 

1. M = N 2 \ y n = Nz n ~ i + x n 

The number of bits required to represent the output 
alphabet using a fixed length code is 

flog 2 (M)l = pog 2 (tf J )l = f21og 2 (JV)l 
Therefore in terms of rate, this coder is equivalent to a 
rate 1/2 convolutional encoder. The encoder memory 
in bits is 2Pog 2 (A01 as each output value depends on 
two input values. 

As an example, consider the situation when N = 4. 
Then A = {0, 1,2,3} and S = {0, 1,2, .... 15}. Given 




the input sequence x n : 0130211033 and assum- 
ing the encoder is initialized with zeros, the output 
sequence will be y n : 0 1 7 12 2 9 5 4 3 15. 

The encoder memory is four bits. Notice that while 
the encoder output alphabet is of size N 2 , at any given 
instant the encoder can only emit one of N different 
symbols as should be the case for a rate 1/2 convo- 
lutional encoder. For example if y n _ i = 0, then y n 
will take on a value from {0, 1,2,..., (TV — 1)}. In gen- 
eral, given a value for y n -i, Vn will take on a va l ue 
from {ctN } qN + l,aJV + 2, ..., aN + N - 1}, where 
a = y n -\{modN). This structure can be used by the 
decoder to provide error protection. The encoder is 
shown in Figure la. 

2. M = jV 3 ; y n = A f2 x 2n -2 + Nx 2n -\ 4- *7n 

The final encoder we consider is equivalent to a rate 
2/3 convolutional coder. Notice that while the input 
output relationship looks similar to a rate 1/3 encoder, 
we generate one output for every two inputs. Thus, 
while the number of bits needed to represent one let- 
ter from the output alphabet is three times the bits 
needed to represent a letter from the input alphabet, 
because two input letters are represented by a single 
output letter, the rate is 2/3. Again, assuming a value 
of 4 for N, the output alphabet is of size 64, and for the 
input sequence used previously, the output sequence 
is y n : 0 52 35 22 49 3. 

The encoder memory is again 6 bits. A block dia- 
gram of the encoder is shown in Figure lb. The rate 
of the encoder can also be inferred from the fact that 
while the encoder output alphabet is of size jV 3 , at any 
instant the encoder can transmit one of N 2 (instead 
of N) symbols. Given a value for y n -i> Vn can take on 
a value from the alphabet {jN 2 ,jN 2 + 1,...,7JV 2 + 

( N 2 - 1)} where 7 = y n -i {modN). 

4 Binary Encoding of the NCE Output 

We will make use of the residual structure in the 
source coder output (which is preserved in the NCE 
output) at the receiver. However, we can also make 
use of this structure in selecting binary codes for the 
NCE output. An intelligent assignment of binary 
codes can improve the error correcting performance 
of the system. Our strategy is to try to maximize the 
Hamming distance between codewords that are likely 
to be mistaken for one another. 

First we obtain a partition of the alphabet based 
on the fact that given a particular value for y n -ii Vn 
can only take on values from a subset of the full al- 


phabet. To see this, consider the rate 1/2 NCE; then 
the alphabet S can be partitioned into the following 
sub-alphabets: 

Sj = jN + 1, ...JN + ;V - 1) j = 0, 1, ...N- 1 

where the encoder will select letters from alphabet 
Sj at time n if j = y n -i(mo<UV). Now for each 
sub-alphabet we have to pick N codewords out of 
M (= N 2 ) possible choices. We first pick the sub- 
alphabet containing the most likely letter. The let- 
ters in the sub-alphabet are ordered according to their 
probability of occurrence. We assign a codeword a 
from the list of available codewords to the most prob- 
able symbol. Then, assign the complement of a to 
the next symbol on the list. Therefore the distance 
between the two most likely symbols in the list is 
K = n°62 ^1 bits. We then pick a codeword 6 from 
the list which is at a Hamming distance of A/2 from 
a and assign it and its complement to the next two 
elements on the list. This process is continued with 
the selection of letters that areA/2 fc away from a at 
the k tfi step until all letters in the subalphabet have 
a codeword assigned to them. We then pick the sub- 
alphabet that contains the next most likely letter. It is 
assigned the available codeword at maximum distance 
from a. The procedure for assigning codewords within 
the sub-alphabet is then repeated. The assignment for 
a rate 1/2 with N = 4 code is shown in Table 1. 


5 Simulation Results 

The proposed approach was simulated using a two- 
bit DPCM system as the source coder, and the three 
NCE described in section 3. The source used were 
standard test images USC Girl, USC Couple and a 
256x256 portion of Lena. The decoder structure used 
was that of a Viterbi decoder with branch metric log L 

r _ P(yi 1 yi) p (y» 1 yt-i>y»--3) 

P(Vi) 

where y% denotes the NCE output and yi denotes 
the corrupted channel output. The probabilities 
P {yi I y«-i»y»-2) were estimated using a training se- 
quence. This requires estimating M N 2 probabilities, 
which were estimated using the USC Girl image. The 
test images were the USC Couple and Lena images. 

The proposed scheme was compared with a con- 
ventional source coder-convolutional coder combina- 
tion. The source coder and source sequence were the 
same in both systems. The convolutional codes se- 
lected were the codes with maximal dj ree and the 




same rate and memory characteristics as the proposed 
NCEs from [14]. The performance measure was the 
signal-to-noise-ratio (SNR) defined as 

r u . 2 

SNR = 101og 10 ~ l - r -2 

E (w« - u o 

where u,- is the input to the source encoder and u t - is 
the output of the source decoder. 

The results show consistent improvement in perfor- 
mance for the proposed system. At low probabilities 
of error both systems perform very well. At high prob- 
abilities of error (> 10~ 2 ), however, there is a substan- 
tial improvement in performance when the proposed 
system is used. 

In Figures 2a and 2b we show the results of one 
of the simulations for the rate 1/2 codes. The bi- 
nary assignment of Table 1 was used in the simula- 
tion. Notice the flatness of the performance curve for 
the proposed system. While the proposed system con- 
sistently outperforms the conventional system, it is at 
higher probabilities of error that the differences really 
become significant. At a probability of error of 10 
there is almost a 6dB difference in the performance 
of the two systems! This “flattening out” of the per- 
formance curve makes the approach useful for a large 
variety of channel error conditions. 

Similar performance improvements can be seen for 
the rate 2/3 system of the second mapping. The per- 
formance curves are shown in Figure 3. Notice that 
again the proposed system consistently outperforms 
the conventional system. In this case at a probability 
of error of 10“ 1 the performance improvement is more 
than 12dB! In fact, the proposed rate 2/3 system per- 
forms better than the conventional rate 1/2 system. 


6 Conclusion 

If the source and channel coder are designed in a 
“joint” manner, that is the design of each takes into ac- 
count the overall conditions (source as well as channel 
statistics), we can obtain excellent performance over 
a wide range of channel conditions. In this paper we 
have presented one such design. The resulting perfor- 
mance improvement seems to validate this approach. 


[2] E. Ayanoglu and R. M. Gray. IEEE Trans. Inform. 
Theory . IT-33:855-865. Nov. 1987. 

[3] T. C. Ancheta, Jr. Ph.D. dissertation, Dept, of 
Electrical Engr., Univ. of Notre Dame. Aug. 1977. 

[4] K-Y Chang and R. W. Donaldson. IEEE Trans. 
Commun. COM-20:338-350. June 1972. 

[5] D. J. Goodman and C. E. Sundberg. Bell Syst. 
Tech. J. 62:2017-203. Sept. 1983. 

[6] R. C. Reininger and J. D. Gibson. IEEE Trans. 
Commun. COM-31:572-577. April 1983. 

[7] R. Steele et al. IEEE Trans. Commun . COM- 
27:252-255. Jan. 1979. 

[8] K. Sayood and J. C. Borkenhagen. Proceedings 
IEEE ICC ’ 86 . 1888-1892. June 1986. 

[9] K. Sayood and J. C. Borkenhagen. IEEE Transac- 
tions on Communications. COM-39. June 1991. 

[10] K. Sayood and J. D. Gibson. Proc. 22nd Annual 
CISS , Princeton, NJ. 380-385. Mar. 1988. 

[11] J. W. Modestino et al. IEEE Trans. Commun. 
COM-29:1262-I274. Sept. 1981. 

[12] D. Comstock and J. D. Gibson. IEEE Trans. 
Commun. COM-32.856-861. July 1984. 

[13] K. Sayood, J. D. Gibson, and F. Liu. Proc. 22nd 
Annual Asilomar Conference on Circuits, Sys- 
tems, and Computers. 102-106. Nov. 1988. 

[14] S. Lin and D. J. Costello. Error Control Coding. 
Prentice Hall. 1983. 


Table 1: Codeword Assignments 


Symbol 

Code 

Symbol 

Code 

0 

0000 

8 

1011 

1 

0011 

9 

0111 

2 

1100 

10 

0100 

3 

mi 

11 

1000 

4 

1110 

12 

0101 

5 

1101 

13 

1001 

6 

0001 

14 

1010 

7 

0010 

15 

0110 
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Figure 1. Proposed Nonbinary Convolutional Encoders 
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Abstract 

Source coders and channel coders are generally designed separately without reference to each 
other. This approach is justified by a famous result of Shannons. However , there are many situa- 
tions in practice in which the assumptions upon which this result is based are violated. Specifically f 
we examine the situation where there is residual redundancy at the source coder output . We have 
previously shown that this residual redundancy can be used to provide error correction using a 
Viterbi decoder. In this paper we present the second half of the design; the design of encoders for 
this situation. We show through simulation results that the proposed coders consistently outperform 
conventional source- channel coder pairs with gains of up to 12dB at high probability of error. 

1 Introduction 

One of Shannon’s many fundamental contributions was his result that source coding and channel 

coding can be treated separately without any loss of performance for the overall system [1]. The 

*This work was supported in part by NASA Lewis Research Center (NAG 3-806) and NASA Goddard Space 
Flight Center (NAG 5-916) 




basic design procedure is to select a source encoder which changes the source sequence into a series 
of independent, equally likely binary digits followed by a channel encoder which accepts binary 
digits and puts them into a form suitable for reliable transmission over the channel. However, the 
separation argument no longer holds if either of the following two situations occur: 

i. The input to the source decoder is different from the output of the source encoder, which 
happens when the link between the source encoder and source decoder is no longer error free, 
or 

ii. The source coder output contains redundancy. 

Case (i) occurs when the channel coder does not achieve zero error probability and case (ii) 
occurs when the source encoder is suboptimal. These two situations are common occurrences in 
practical systems where source or channel models are imperfectly known, complexity is a serious 
issue, or significant delay is not tolerable. Approaches developed for such situations are usually 
grouped under the general heading of joint source/channel coding. 

Most joint source channel coding approaches can be classified in two main categories; (A) 
approaches which entail the modification of the source coder/decoder structure to reduce the effect 
of channel errors, and (B) approaches which examine the distribution of bits between the source 
and channel coders. The first set of approaches can be divided still further into two classes. One 
class of approaches examines the modification of the overall structure, while the other deals with 
the modification of the decoding procedure to take advantage of the redundancy in the source 
coder output. 

To the first class belongs the work of Dunham k Gray [2] who proved the existence of joint source 
channel trellis coding systems for certain fidelity criteria, and a design of a joint source channel 
trellis coder presented by Ayanoglu and Gray [3], where the design procedure is the generalized 
Lloyd algorithm. Further, Massey [4] and Ancheta [5] showed that for distortionless transmission 
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of the source using linear joint source channel encoders, equivalent performance can be obtained 
with a significant reduction in complexity. Chang and Donaldson [6] propose modifications to the 
DPCM system to reduce the effect of channel errors, while Kurtenbach and Wintz [7] and Farvardin 
and Vaishampayan [8] study the problem of optimum quantizer design for noisy channels. Goodman 
and Sundberg [9,10] propose an embedded DPCM system which consists of a two bit DPCM and 
a two bit PCM system in parallel. 

In the second class of category A, we include the work of Reininger and Gibson [11], who use 
the fact that coefficients in neighboring blocks in a transform coding scheme will not vary greatly, 
and thus use coefficients from neighboring blocks to correct a possible error, and the work of Steele, 
Goodman and McGonegal [12,13], who propose a difference detection and correction scheme for 
broadcast quality speech. In this scheme the receiver infers an error whenever an individual sample 
to sample difference is greater than the mean squared difference of a 21 sample sliding block. When 
an error is detected, the received sample is replaced by the output of a smoothing circuit. Ngan 
and Steele [14] use a similar method for recovering from errors in an image transmission system. 
Sayood and Borkenhagen [16, 17] use the redundancy at the source coder output to perform 
sequence estimation. Sayood and Gibson [18] examine “desirable” properties for encoders which 
enhance sequential estimation performance. 

The work of Modestino, Daut and Vickers [19] belongs to category B. In their study of transform 
coding they examine tradeoffs between allocating bits for source and channel coding. Comstock 
and Gibson [20] extend this work and provide an explicit mechanism for allocating bits between 
a source coder and a Hamming channel coder. Additionally, Moore and Gibson [21] study the 
allocation of bits between a DPCM coder and self orthogonal convolutional coding. 

In this paper we present an approach to joint source/channel coder design, which belongs to 
category A, and hence we explore a technique for designing joint source/channel coders, rather 
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than ways of distributing bits between source coders and channel coders. We assume that the 
two nonideal situations referred to earlier are present. For a nonideal source coder, we use MAP 
arguments to design a decoder which takes advantage of redundancy in the source coder output to 
perform error correction. We then use the decoder structure to infer the encoder design. 

2 The Design Criterion 

For a discrete memoryless channel (DMC), let the channel input alphabet be denoted by A — 
and the channel input and output sequences by Y = {t/o, 2/i> • • ♦ > VL- 1 } an( * 
Y = {yo, fa ,- .. respectively. If A = {A,} is the set of sequences Ai = {a,- ?0 , a; f i, . . 

a t - * 04 , then the optimum receiver (in the sense of maximizing the probability of making a correct 
decision) maximizes P[C] y where 


P[C} = J2P[C\Y}P[Y}. 

Ai 

This in turn implies that the optimum receiver maximizes P[C|Y]. When the receiver selects the 
output to be Aki then P[C|Y] = P\Y — i4^|y]. Thus, the optimum receiver selects the sequence 
Ak such that 


P[Y = A k \Y) > P[Y = Ai\Y ] V,-. 


Lemma 1 

Let t/,- be the input to a DMC. Given is conditionally independent of y n _jt,fc > 1. If 

j/o = Vo then the optimum receiver selects a sequence Ai to maximize nfc^PCy.lj/.-i,^) where 
Y k = {yk,yk+i,---,yL-i}- ( Tlie proof is given in the appendix.) 

The lemma addresses the situation in case (ii), i.e., the situation in which the source coder 
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output (which is also the channel input sequence) contains redundancy. Using this lemma, we can 
design a decoder which will take advantage of dependence in the channel input sequence. The 
lemma provides the mathematical structure for the decoder. The physical structure can be easily 
obtained by examining the quantity to be maximized. The optimum decoder maximizes P(Y|Y) 
or equivalently logP(y|Y), but 


log p(Y|y) = X>gP(y,-|y;,y,-i) (i) 

which is similar in form to the path metric of a convolutional decoder. Error correction using 
convolutional codes is made possible by explicitly limiting the possible codeword to codeword 
transitions, based on the previous code input and the coder structure. At the receiver the decoder 
compares the received data stream to the a priori information about the code structure. The 
output of the decoder is the sequence that is most likely to be the transmitted sequence. In the 
case where ther is residual strucure in the source coder output, while there may not be an explicit 
limitation on the codeword to codeword transition, the structure makes some sequences more likely 
to be the transmitted sequence, given a particular received sequence. In other words, while there 
is no structure being imposed by the encoder, there is sufficient residual structure in the source 
coder output that can be used for error correction. This structure can be quantified in light of 
the Lemma. That is, the structure is reflected in the conditional probabilities, and can be utilized 
via the path metric in (1) in a decoder similar in structure to a convolutional decoder. However, 
to implement this decoder we need to be able to compute the path metric. Unfortunately the 
quantity P(j/,|y, is difficult to estimate. We have therefore used various approximations to 
this quantity with some success. In [16, 17] P(y, |Li, Jft-i) is approximated by P(y* |y», 3/t— l ) with 
excellent results. Other approximations can be found in [22]. In the current work we use the 
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following approximation for the branch metric L. 


L = log 


P(yi I yi)P(yi 1 K-iiyt- 2 ) 

P(Vi) 


In [17] we showed that the use of the decoder led to dramatic improvements under high error 
rate conditions. However at low error rates the performance improvement was from nonexistent 
to minimal. This is in contrast to standard error correcting approaches, in which the greatest 
performance improvements are at low error rates, with a rapid deterioration in performance at 
high error rates. In this work we combine the two approaches to develop a joint source channel 
codec which provides protection equal to the standard channel encoders at low error rates while 
providing significant error protection at high error rates. 

3 Convolutional Encoders and Joint Source/Channel Decoder 

As convolutional coders provide excellent error protection at low error rates, and have a decoder 
structure similar to the JSC decoder, one way we can combine the two approaches is to obtain 
the transition probabilities of the convolutional encoder output and use the Joint Source/Channel 
(JSC) decoder described above instead of the conventional convolutional decoder. We simulated 
this approach using a two bit DPCM system as the source encoder. We used the three images 
shown in Figure 1 as the source. The USC Girl image was used for training (obtaining the requisite 
transition probabilities) and the Lena 256 and USC Couple images for testing. The output of the 
DPCM system was encoded using a (2,1,3) convolutional encoder with connection vectors 

5 (U = 64 gW = 74. 
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The convolutional encoder was obtained from [23]. The performance of the different systems was 
evaluated using two different measures. One was the reconstruction signal-to-noise ratio (RSNR) 
defined as 

RSNR = 10 log 10 E “' 2 . 

E («; - «.) 

where u, is the input to the source coder (source image) and u t - is the output of the source decoder 
(reconstructed image). The other performance measure was the decoded error probability. The 
received sequence was decoded using a standard convolutional decoder and the JSC decoder. A 
block diagram of the system is shown in Figure 2. The results are presented in Figure 3. Notice the 
significant improvement in performance when the JSC decoder is used instead of the convolutional 
decoder. At a probability of error of 0.1 there is an improvement of about 5dB for the training set 
and an improvement of about 4 dB for the test set! 

The simulations were repeated with a rate 2/3 (3,2,2) convolutional coder [23] with connection 


vectors , „ 

„<» = 7 g[ 2) = l si 31 = 4 

S?> = 2 9 <’> = 5 9 < ! > = 7 . 

The results are presented in Figure 4. Notice that while the rate of the code is less (2/3 as opposed 
to 1/2) the performance using the JSC decoder is actually better! The reason for this lies in the 
fact that the JSC decoder is making use of the structure in the nonbinary output of the source 
coder. When we used the (2,1,3) coder we destroyed some of this structure because the source 
coder was putting out two bit words while the channel coder was coding the input one bit at a 
time. However, in the case of the (3,2,2) coder, the input alphabet of the channel coder exactly 
matches the output alphabet of the source coder. Thus the structure in the source coder output 


is preserved in the channel coder output/channel input, providing better channel error protection. 
To verify this we conducted another set of simulation with a rate 1/2 (4,2,1) convolutional code 
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with connection vectors 


9 = 6 = 0 = 6 g\ = 4 

02 1 * = 0 9 ^ = 6 9 2 3) = 4 02 0 = 2 - 
In this case again there is a one-to-one match between the source coder output and the channel 

coder input, and the results shown in Figure 5 reflect this fact. There is about a two dB improve- 
ment at high error rates over the (2,1,3) rate 1/2 code, and about a one dB improvement over the 
rate 2/3 code. These results justify the contention that for best use of the JSC decoder the input 
alphabet size of the channel coder should be the same as the size of the output alphabet of the 
source coder. In the next section we propose a general channel coder design which is motivated by 
this requirement, 

4 A Modified Convolutional Encoder 

Given that the preservation of the structure in the source coder output requires the channel coder 
input alphabet to have a one-to-one match with the generally nonbinary source coder, we propose 
a general nonbinary convolutional encoder (NCE) whose input alphabet has the requisite property. 

Let x„, the input to the NCE, be selected from the alphabet A = {0, 1,2 , N - 1}, and let 
y n , the output alphabet of the NCE, be selected from the alphabet 5 = {0, 1,2, ...,M - 1}. Then 
the proposed NCEs can be described by the following mappings 

1. M = JV 2 ; y n = JVxn-i + x n 

The number of bits required to represent the output alphabet using a fixed length code is 

Pog 2 (M)l = [log 2 (iV 2 )l = \2\og 2 (N)} 

Therefore in terms of rate, this coder is equivalent to a rate 1/2 convolutional encoder. The encoder 
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memory in bits is 2[log 2 (7V)] as each output value depends on two input values. 

As an example, consider the situation when N = 4. Then A = {0, 1,2, 3} and S = {0, 1 , 2 ,..., 15}. 
Given the input sequence x n : 0130211033 and assuming the encoder is initialized with 
zeros, the output sequence will be y n : 0 1 7 12 2 9 5 4 3 15. 

The encoder memory is four bits. Notice that while the encoder output alphabet is of size 
TV 2 , at any given instant the encoder can only emit one of TV different symbols as should be 
the case for a rate 1/2 convolutional encoder. For example if y n _i = 0, then y n will take on a 
value from {0, 1, 2, ..., (TV - 1)}. In general, given a value for y n _i, y n will take on a value from 
{aTV, aTV + 1, aTV + 2, ...,a TV + TV — 1}, where a = y n „i(modN). This structure can be used by the 
decoder to provide error protection. The encoder is shown in Figure la. 

2. M = A r3 ; y n = N 2 x 2n -2 + N x 2n - i + * 2 n 

This encoder is equivalent to a rate 1/3 convolutional encoder with an encoder memory in bits 
of 3[log2(iV)]. Given the same input as the previous example, the output alphabet for the NCE is 

5 = {0,1, 2, ...,63} 

and the output sequence for the same input sequence is 

y n : 0 1 7 28 50 9 37 20 19 15 

The encoder memory is six bits. In this case even though the encoder output alphabet is of size 
N 3 , at any instant the encoder can only emit one of N symbols. In general, given a value for 
t/ n _i, y n will take on a value from {/?TV,/37V + 1, ...,/?TV + 7V — 1}, where / 3 = y n ^i(modN 2 ). A block 
diagram of the encoder is shown in Figure 6. 
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3. M = N 3 


y n = JV 2 X 2n + NX 2n-l + X?n-2 


The final encoder we consider is equivalent to a rate 2/3 convolutional coder. Notice that 
while the input output relationship looks similar to a rate 1/3 encoder, we generate one output for 
every two inputs. Thus, while the number of bits needed to represent one letter from the output 
alphabet is three times the bits needed to represent a letter from the input alphabet, because two 
input letters are represented by a single output letter, the rate is 2/3. Again, assuming a value of 
4 for N, the output alphabet is of size 64, and for the input sequence used previously, the output 
sequence is y n : 0 52 35 22 49 3. 

The encoder memory is again 6 bits. The rate of the encoder can also be inferred from the fact 
that while the encoder output alphabet is of size N 3 , at any instant the encoder can transmit one 
of N 2 (instead of N) symbols. Given a value for y„_i, y n can take on a value from the alphabet 
{~i'N 2 , iN 2 + 1, ..., 7 jY 2 + {N 2 - 1)} where 7 = y n -i(modN). 


5 Binary Encoding of the NCE Output 

We will make use of the residual structure in the source coder output (which is preserved in the NCE 
output) at the receiver. However, we can also make use of this structure in selecting binary codes 
for the NCE output. An intelligent assignment of binary codes can improve the error correcting 
performance of the system as can be seen from the following example. 

Let N be 2, and let us use the rate 1/2 NCE. In this case if y n = 0, y n +i cannot be 2 or 3 
because y n = 0 means x n = 0, and y n+ i = 2 or 3 means x n = 1. Thus a decoded sequence cannot 
have 2 or 3 following 0. 

Let us assign fixed length codewords to the NCE outputs as 
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0 : 00 , 1 : 01 , 2 : 10 , 3 : 11 


Now suppose the transmitted sequence was the all zero sequence, the metric used was the 
Hamming distance, and the received sequence is 00001000000000; that is, there is an error in 
the fifth bit. If the receiver decoded the first four bits as 0,0 then it cannot decode the fifth and 
sixth bits as 2 for the reason noted above. The only two options are decoding them as 0 or 1. If 
we decoded them as 0, we could continue decoding the rest of the sequence as 0,0..., and the 
Hamming distance between the received and decoded sequence would be one. If we decoded them 
as 1, we would have to decode the next set of two bits as 2 or 3 because 0 cannot follow 1. Decoding 
as 2 gives the smallest Hamming distance so we decode the seventh and eighth bit as 2. This gives 
a total Hamming distance of two for the incorrect path. Thus the receiver will select the correct 
path (the path with the smallest Hamming distance). 

If the assignment had been chosen as 

0 : 00 ; 1 : 11 ; 2 : 10 ; 3 : 01 

then the Hamming distance for the closest incorrect path would have been three instead of two. 
When each allowable sequence is equally likely, there is little reason to prefer one particular assign- 
ment over others. However, when certain sequences are more likely to occur than others, it would 
be useful to make assignments which increase the ‘distance’ between likely sequences. While, for 
small alphabets it is a simple matter to assign the optimum binary codewords by inspection, this 
becomes computationally impossible for larger alphabets. ^Ve use a rather simple heuristic which, 
while not optimal, provides good results. 
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The number of M bits codewords that have to be assigned are exactly 2 M . Our strategy 
is therefore to try to maximize the Hamming distance between codewords that are likely to be 
mistaken for one another. 

First we obtain a partition of the alphabet based on the fact that given a particular value for 
y n _i, y n can only take on values from a subset of the full alphabet. To see this, consider the rate 
1/2 NCE; then the alphabet S can be partitioned into the following sub-alphabets: 

50 = (0, 1,2,3..., JV — 1) 

51 = (N,N + 1, ...,21V — 1) 

S jV -i = (N(N -1),N(N -l) + l,...,iV 2 -l) 

where the encoder will select letters from alphabet Sj at time n if j = y n _i(modA r ). Now for 
each sub-alphabet we have to pick N codewords out of M (= IV 2 ) possible choices. We first pick 
the sub-alphabet containing the most likely letter. The letters in the sub-alphabet are ordered 
according to their probability of occurrence. We assign a codeword a from the list of available 
codewords to the most probable symbol. Then, assign the complement of a. to the next symbol on 
the list. Therefore the distance between the two most likely symbols in the list is K = flog 2 M] 
bits. We then pick a codeword b from the list which is at a Hamming distance of K/2 from a and 
assign it and its complement to the next two elements on the list. This process is continued with 
the selection of letters that &reK/2 k away from a at the k th step until all letters in the subalphabet 
have a codeword assigned to them. As an example, consider the case where N = 4. The partitions 
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are 


So = { 0 , 1,2,3} 


Si = {4, 5, 6, 7} 

S 3 = {8,9,10,11} 

S A = {12,13,14,15}. 

Assuming that 0 is the most probable symbol, we start by assigning codewords to the So sub- 
alphabet. Suppose 

P( 0) > P(l) > P (2) > P (3) . 

We first pick a 4 bit codeword for 0 as 0000. The next most probable symbol in this sub-alphabet 
is 1; therefore the codeword for 1 is the complement of the codeword for 0; 1:1111. The codeword 
for 2 is at a Hamming distance of two from the codeword for 0. The codeword 0011 satisfies this 
requirement 5 therefore the codeword for 2 is 0011 and the codeword for 3 is 1100. Suppose the next 
symbol which is close in probability to the symbol 0 is 4. We select the sub-alphabet containing that 
symbol which is S\. To the symbol 4 we assign a codeword from the list of unassigned codewords 
which is furthest from the codeword for 0. There are several possibilities for this; we pick 1011. 
We then follow the same procedure for the Si sub-alphabet. Continuing in this manner we get the 
assignments shown in Table 1. 

6 Simulation Results 

The proposed approach was simulated using the same setup as was used in the preceding simula- 
tions. The rate 1/2 NCE and rate 2/3 NCE were simulated. The results are shown in Figures 7 
and 8. Notice that the performance for the rate 1/2 NCE is about the same or slightly better than 
the performance of the (4,2,1) convolutional code. Similarly the performance of the rate 2/3 NCE 


13 




is about the same as the rate (3,2,2) convolutional code. Note that the memory requirements for 
both the NCE and the convolutional coders in both cases are the same. While the performance of 
the NCEs and the properly matched convolutional coders are about the same, the NCEs can be 
designed using a general algorithm for a source coder with any alphabet size. 

7 Conclusion 

If the source and channel coder are designed in a “joint” manner, that is the design of each takes 
into account the overall conditions (source as well as channel statistics), we can obtain excellent 
performance over a wide range of channel conditions. In this paper we have presented one such 
design. The resulting performance improvement seems to validate this approach, with a “flattening 
out” of the performance curves. This flattening out of the performance curves makes the approach 
useful for a large variety of channel error conditions. 
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Table 1: Codeword Assignments 


Symbol 

Code 

Symbol 

Code 

0 

0000 

8 

1011 

1 

0011 

9 

0111 

2 

1100 

10 

0100 

3 

mi 

11 

1000 

4 

1110 

12 

0101 

5 

1101 

13 

1001 

6 

0001 

14 

1010 

7 

0010 

15 

0110 
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Abstract 

Differential encoding techniques are fast and easy to implement. 
However, a major problem with the use of differential encoding 
for images is the rapid edge degradation encountered when us- 
ing such $}stems. This makes differential encoding techniques 
of limited utility especially when coding medical or scientific 
images, where edge preservation is of utmost importance. We 
present a simple, easy to implement differential image coding 
system with excellent edge preservation properties. The coding 
system can be used over variable rate channels which makes it 
especially attractive for use in the packet network environment. 


Introduction 

The transmission and storage of digital images requires an enor- 
mous expenditure of resources, necessitating the use of compres- 
sion techniques. These techniques include relatively low com- 
plexity predictive techniques such as Adaptive Differential Pulse 
Code Modulation (ADPCM) and its variations, as well as rel- 
ath ely higher complexity techniques such as transform coding 
and vector quantization [1,2]. Most compression schemes were 
originally developed for speech and their application to images is 
at times problematic. This is especially true of the low- complex- 
ity predictive techniques. A good example of this is the highly 
popular ADPCM scheme. Originally designed for speech [3], it 
ha^ been used with other sources with varying degrees of suc- 
cess. A major problem with its use in image coding is the rapid 
degradation in quality whenever an edge is encountered. Edges 
are perceptually very important and occur quite often in most 
images. Therefore, the degradation of edges can be perceptually 
very annoying. If the images under consideration contain medi- 
cal or scientific data, the problem becomes even more important, 
as edges provide position information which may be crucial to 
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the viewer. This poor edge reconstruction quality has been a 
major factor in preventing ADPCM from becoming as popular 
for image coding as it is for speech coding. 

While good edge reconstruction capability is an important 
requirement for image coding schemes, another requirement that 
is gaining in importance with the proliferation of packet switched 
networks, is the ability to encode the image at different rates. 
In a packet switched network, the available channel capacitv is 
not a fixed quantity, but rather fluctuates as a function of\he 
load on the network. The compression scheme must therefore be 
capable of operating at different rates as the available capacity 
changes. This means that it should be able to take advantage 
of increased capacity when it becomes available while providing 
graceful degradation when the rale decreases to match decreased 
available capacity. 

In this paper we describe a DPCM based coding scheme 
which has the desired properties listed above. It is a low com- 
plexity scheme with excellent edge preservation in the recon- 
structed image. It takes full advantage of the available channel 
capacity providing lossless compression when sufficient capacity 

is available, and very graceful degradation when a reduction in 
rate is required. 

Notation and Problem Formulation 

The DPCM system consists of two main blocks, the quantizer 
and the predictor (see Fig. 1). The predictor uses the correlation 
between samples of the waveform to predict the next sample 
value. This predicted value is removed from the waveform at 
the transmitter and reintroduced at the receiver. The prediction 
error is quantized to one of a finite number of values which is 
coded and transmitted to the receiver. The difference between 
the prediction error and the quantized prediction error is called 
the quantization error or the quantization noise. If the channel 
is error free, the reconstruction error at the receiver is simpiv the 
quantization error. To see this, note (Fig. 1) that the prediction 
error «(*) is given by 

'(k) = s(k)-p(k) ( 1 ) 

where the predicted value is given by 

p(*)=ZM(*-;) (2) 

and 
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i(*) = «t(*) + P(*)- (3) 

Assuming an additive noise model, the quantized prediction 
error can be represented as 

e,{k) = e(k) + n,(k) (4) 

where denotes the quantization noise. The quantized pre- 
diction error is coded and transmitted to the receiver. If the 
channel is noisy this is received as e 5 (/;) which is given by 

*«(*) = «,(*)+"<(*) (5) 

where n c (fc) represents the channel noise. The output of the 
receiver s(k) is thus given by 



*(*) = «*) + i 

.(*) 


(6) 

where 






?{k) = p[k) + n 

»(*) 


(-) 

the additional term n ? (/r) being the 

result 

of the 

introduction 

of channel 

noise into the prediction process. 

Using 

0). (4). (5), 

and (7) in 

(6) we obtain 





i(i) = j(A-) + t i,(*) + tj 

c(k) + 

n f (k). 

(S) 


If the channel is error free, the last two terms in (8) drop out 
and the difference between the original and reconstructed signal 
is simply the quantization error. 

When the prediction error is small, it falls into one of the 
inner levels of the quantizer, and the quantization noise is of a 
type referred to as granular noise. If the prediction error falls in 

one of the outer lev els of the quantizer, the incurred quantization 
error is called overload noise. Because of the way the granular 
noise is generated it is generally smaller in magnitude than the 
overload noise and is bounded by the size of the quantization 
interval. The overload noise on the other hand is essentially 
unbounded and can become very large depending on the size of 
the prediction error. As edge pixels are rather difficult to predict, 
the corresponding prediction error is generally large, and this 
leads to large overload noise values. Furthermore, because this 
error effects not only the reconstruction of the current pixel, 
but also future predictions, the prediction errors corresponding 
to the next few pixels also tend to be large, leading to an edge 
“smearing* effect. 

Reduction of the edge degradation can therefore be obtained 
bv reducing or eliminating the slope overload noise. Reduc- 
tion of the slope overload noise can be obtained by improving 
the prediction process. Gibson [4] analyzed ADPCM systems 
with backward adaptive prediction, and showed that the track- 
ing ability of the adaptive predictor can be improved by the 
addition of zeros in the predictor. Motivated by these results, 
Sayood and Schekall [5] designed ADPCM systems for image 
coding with ARMA predictors. Their results show that some 
reduction in the edge degradation is possible with the use of 
adaptive 2eros in the predictor. While the use of these predic- 
tors improves the edge reconstruction there is still significant 
degradation in the edges. One technique to further improve the 
edge performance was developed by Schekall and Sayood [6], 
which uses the Jay ant quantizer as an edge detector. The over- 
load noise is then reduced by sending a quantized representation 


of the noise through a side channel. The advantage of this ap 
proach is that it can be added to existing ADPCM systems. 
The disadvantage is that the use of a side channel introduces 
synchronization problems. In this paper we propose a different 
approach for edge preservation which does not require a side 
channel. This approach is described in the following section. 

Proposed Approach 

The approach taken in this paper is a variation on the standard 
rale-distortion tradeoff. The basic idea is that the slope over- 
load noise can be reduced by increasing the rate. However rather 
than increasing the rate for encoding each and every pixel, there 
is only an instantaneous rate increase whenever slope overload 
is encountered. The way this is implemented is outlined in the 
block diagram of Figure 2. A DPCM system is followed by a 
lossless encoder at the transmitter. At the receiver the inverse 
operations are performed. The DPCM system differs from stan- 
dard DPCM systems in that the quantizer being used has an 
unlimited number of levels. In practice what this means is that 
if the input has 256 levels, which is standard for monochrome 
images, then the DPCM quantizer will have 512 levels. This 
effectively eliminates the overload noise making the distortion 
a function of the quantizer stepsize A. Of course by itself it 
also eliminates any compression that may have been desired, in 
fact it requires an increase of one bit in the rate. The com- 
pression is obtained by use of the lossless encoder. The lossless 

encoder output alphabet consists of .V codewords. These code- 
words correspond to N consecutive levels in the quantizer. Let 
the smallest level be labeled xi and the largest level be labeled 
If the quantizer output e 5 (/r) is a level between z i and 
r//, then the lossless encoder puts out the corresponding chan- 
nel symbol. If, however, e 5 (fc) is greater than zh the encode: 
puts out the symbol corresponding to z h . A new value e ? ](k) 
is then obtained by subtracting zh from e ? (A*). If this value is 
less than z # then it is encoded using the corresponding code- 
word in the lossless encoder output alphabet. Otherwise, x// is 
again subtracted from to generate e ? :(k). This process is 

continued till some e ?Ti (fc) where 

V-W - ««(*) ~ 

and e ;n (fc) is less than x;/. A similar strategy is followed when 
^ Thus the instantaneous rate is increased by a func- 
tion of n whenever the prediction error falls outside the dosed 
interval [xi,xj/]. 

Example : Consider a DPCM system with a stepsize A of 2 
where the input output relationship is given by 

Q\z] = 2k if 2*- 1 < x < 2fr+ 1; Hr = 0,±1,±2,. .. 

Let the lossless encoder output alphabet be of size eight with 
xl = -4, and x# = )0. If the input e(/r) is 7 the quanliger out- 
put e 9 (fc) js 8, which is in the lossless'encoder output alphabet 
and therefore this value is encoded as a single codeword. If e(k) 
is 15 then e 5 (/r) is 16, which is larger than x>/. In this case, 
the encoder puts out the codeword corresponding to xh and 
generates t^(k) =16 — 10 — 6 which is in the encoder output 
alphabet. Therefore, the encoder output consists of two code- 
words representing x>/ ( 1 0) and 6. 3ft he input is -7,e ? (Hr) = -6 
which is less than xi. Thus the lossless encoder output consists 
of two symbols. One corresponding to the value of xl(-4 ) and 
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one corresponding to the value of -2. Note that if the input is 
10 or -4 (i.e. x H or x L ) then the output will be the sequence 
10,0 or -4,0. 

One of the consequences of this type of encoding is that it 
can generate runs of x L and x H whenever the image contains 
a large number of edges. Fortunately the encoding scheme also 
provides a significant number of special symbols that can be used 
to encode these runs. For example, the sequence xh followed by 
a negative value and the-sequence xi followed by a positive value 
would not occur in the normal course of events. These sequences 
can therefore be used to encode the runlengths of x L and 
Consider for example a system in which A is 2 and x L is - 4. 
The output of the lossless encoder therefore corresponds to the 
values -4, -2, 0,2, 4. In the standard system a value of 4 is 
always followed by a value of 0 or 2. Similarly a value of -4 is 
always followed by a value of 0 or -2. Therefore. the sequences 
4 — 2 and —4 + 2 can be used as special symbols to denote runs of 
4 or -4. A simple strategy is to replace every two 4’s>(or -4’s) 
after the initial 4 by a -2 (or 2). For example a value of 10 
would still be represented by 4 4 2. However a value of 14 would 
be represented by 4-2 2 instead of 4 4 4 2. Similarly a value 
of 18 would be represented by 4 -2 4 2 and a value of 22 would 

be represented by 4 -2 -2 2. For this particular scheme, a run of 
length n would be represented by n - codewords. When 

the size of the lossless encoder output is increased, the number 
of special symbols available also increases and the coding of the 
runs can be performed more efficiently. 

These special sequences can also be used to signal a change 
of rate for applications in which the available channel capacity 
changes with time. The actual change can be accomodated by 
changing the stepsize and reducing the lossless encoder codebook 
size by the same amount. Several of the systems proposed above 
were simulated. The results of these simulations are presented 
in the next section. 


Results 


Four systems of the type described in the previous section have 
been Simulated. Two of the systems simulated use a one tap 
fixed predictor, while the other two use a one pole four zero 
predictor with the 2 eros being adaptive. One of the systems in 
each case contains the lossless encoder followed by a runlength 
encoder while the other contains only the lossless encoder with- 
out the runlength encoder. The test images used were the USC 
GIRL image, and the USC COUPLE image. Both are 256 X 
256 monochrome eight bit images and have been used often as 
test images. The objective performance measure were the Peak 
Signal to No;se Ratio (PSNR) and the Mean Absolute Error 
(MAE) which are defined as follows: 


PSNR = 10]ogj 0 


2 55 2 

<(*(*)) -i(*)3> 


MAE =<|s(t)-s(A*)|> 

where < • > denotes the average value. 

Several initial test runs were performed using different num- 
ber of levels, different values of ijr, and different values of A 
to get a feel for the optimum values of the various parameters 
(Given xl and A, xh is automatically determined.). We found 
that an appropriate way of selecting the value of xl was using 
the relationship 


where [zj is the largest integer less than or equal to x, and N 
is the size of the alphabet of the lossless coder. This provides 
a symmetric codebook when the alphabet size is odd, and a 
codebook skewed to the positive side when the alphabet size is 
even. The zero value is always in the codebook. 

A* the alphabet size is usually not a power of two, the binary 
code for the output alphabet will be a variable length code. The 
use of variable length codes always bring up issues of robustness 
with respect to changing input statistics. With this in mind, 
the rate was calculated in two different ways. The first was to 
find the output entropy, and scale it up by the ratio of symbols 
transmitted to the number of pixels encoded. We call this rate 
the entropy rale, which is the minimum rate obtainable if we 
assume the output of the lossless encoder to be memorvless. 

While this assumption is not necessarily true, the entropy rate 
gives us an idea about the best we can do with a particular 
system. We will treat it as the lower bound on the obtainable 
rate. We also calculated the rate using a predetermined variable 
length code. This code was designed with no prior knowledge 
of the probabilities of the different letters. The only assumption 
was that the letters representing the inner levels of the quantizer 
were always more likely than the letters representing the outer 
levels of the quantizer. The code tree used is shown in Figure 3. 
Obviously, this will become highly inefficient in the case of small 
alphabet size and small A, as in this case, the outer levels x l 
and x H will occur quite frequently. This rate can be viewed as 
an upper bound on the achievable rate. 

The results for the system with a one tap predictor and with- 
out the runlength encoder are shown in Tables 1 and 2. Table 1 
contains the results for the COUPLE image, while Table 2 con- 
tains the results for the GIRL image. In the table R L denotes 
the entropy rate while Rv is the rate obtained using the Huffman 
code of Figure 3. Recall that for image compression schemes, 
systems with PSNR values of greater than 35 dB are percep- 
tually almost identical. As can be seen from the PSNR values 
in the tables there is very little degradation with rate, and in 
fact if we use the 35 dB criterion there is almost no degrada- 
tion in image quality until the rate drops below two bits per 
pixel. This can be verified by the reconstructed images shown 
in Figure 4. Each picture in Figure 4 consists of the original 
image, the reconstructed image and the error image magnified 
10 fold. In each of the pictures, it is extremely difficult to tell 
the source or original image from the reconstructed or output 
image. In fact, in the case of the image coded at rates above 
two bits per pixel it is well nigh impossible. This subjective ob- 
servation is supported by the error images in each case which 
are uniform in texture throughout without any of the standard 
edge artifacts which can be usually seen in the error images for 
most compression schemes. 

We can see from the results that jf the value of A and hence 
xl is fixed, the size of the codebook has no effect in on the perfor- 
mance measures. This is because the only effect of reducing the 
codebook size under these conditions is to increase the number 
of symbols transmitted. While this has the effect of increasing 
the rate, because of the way the system is constructed, it does 
not influence the resulting distortion. The drop in rate for the 
same distortion as the alphabet size increases can be clearly seen 
from the results in Tables 1 and 2. 
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Table 3 shows the decrease in rate when a simple runlength 
coder is used. The runlength coder encodes long strings of x L 
and x H using the special sequences mentioned previously. As 
can be seen from the results the improvement provided by the 
current runlength encoding scheme is significant only for small 
alphabets and small values of A. This is because it is under 
these conditions that most of the long strings of x L and x H are 
generated. However we are not as yet using many of the special 
sequences in the larger alphabet codebooks, so there is certainly 
room for improvement. 

The one tap predictor was replaced with an adaptive ARMA 
predictor with a fixed pole and four adaptive zeros. The fixed 
pole was at a lag of 257 (pixel above) while the zeros were at 
lags of one, two, three and four. The adaption was performed 
u «ing a sample LMS algorithm as follows. Let B* be the vector 
of predictor coefficients at time k. The adaption algorithm was 

B* + i = Bj + 

where p is the adaption stepsize and 

Ei = (e t (fc - l).« t t* - 2 W* - 3W* - 4 )) T • 

The results from using this predictor are shown in Tables 4, 5 
and 6. While there is some improvement in all cases, the results 
for the COUPLE imace show a greater improvement than the 
results for the GIRL image. This can be explained by noting 
that the COUPLE image contains many more edges than the 
GIRL image. As the ARMA predictor tends to improve predic- 
tor performance when edges are encountered, the improvement 
in performance occurs in the image with more edges. 

Cc v. elusion 


We have demonstrated a simple image coding scheme which is 
very easy to implement in realtime and has excellent edge preset- 
valicr. properties o\e: £ wide range of rates. 

This jvficm would be especially useful in transmitting im- 
?.r C < over channels where the available bandwidth may be vary. 
The edge preserving quality is especially useful in the encoding 
of scientific and medical images. 
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Delta 

MAE 

PSNR (dB) 

Size 

Ri 

= 3 
Rv 

Size = 5 
Rl Ru 

Size = 8 
Rl Ru 

2 

4 

6 

8 

12 

0.5067 

1.4790 

2.4676 

3.3697 

5.1359 

51*0830 

42.7898 

38.6565 

36.0009 

32.3682 

6.1615 

3.8909 

2.9577 

2.4314 

1.6277 

7.1418 

4.0587 

3.0137 

2.4972 

1.9800 

4.9334 6.8635 

3.3637 2.7982 

2.6553 2.7729 

2.2327 2.2756 

1.7233 1.7963 

4.4404 6.6884 

3.1673 3.6939 

2.5490 2.7023 

2.1662 2.2267 

1.6930 1.7669 

Table 1: Performance results for the 

COUPLE image, alphabet 

size 3, 5 and 8. 
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0.5067 

1.4790 

2.4676 

3.3697 

5.1359 

51.0830 

42.7898 

38.6565 

36.0009 

32.3682 

6.2821 

4.0088 

3.0819 

2.5543 

1.9426 

7.8120 

4.3976 

3.2547 

2.6860 

2.1122 

5.0554 7.4713 

3.7414 4.0592 

2.7570 2.9279 

2.3272 2.3783 

1.8046 1.8439 

4.5635 7.1275 

3.2668 3 8740 

2.6468 2.8063 

2.2617 2.2931 

1.7786 1.8009 

n r J C 


T,b!e 2: Perforce results for the GIRL image, alphabet size 3, 5 and 8. 
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Size 

= 3 

Size 

= 5 

Size 

= 8 

MAE 

PSNR (dB) 

Rl 

Ru 

Rl 

Ru 

Rl 

Ru 

1.59 

46.11 

4.71 

5.00 

3.94 

4.77 

3.63 

4.69 

2.00 

40.71 

3.02 

3.04 

2.70 

2.82 

2.50 

2.76 

2.96 

37.42 

2.33 

2.38 

2.14 

2.18 

2.09 

2.13 

3.86 

35.11 

1.94 

2.05 

1.81 

1.87 

. 1.79 

1.83 

5.61 

31.79 

1.49 

1.72 

1.42 

1.56 

1.41 

1.55 

T Hu ah CG 

results for COUPLE image 

with adaptive ARMA predictor. 



Size 

= 3 

Size 

= 5 

Size 

= 8 

MAE 

PSNR (dB) 

Rl 

Ru 

Rl 

Ru 

Rl 

Ru 


2 1.07 

4 2.06 

6 3.06 

8 4.04 

12 6.08 


45.99 

5.66 

6.33 

40.55 

3.60 

3.69 

37.15 

2.78 

2.82 

34.75 

2.31 

2.38 

31.23 

1.79 

1.95 


4.59 

6.06 

4.18 

5.92 

3.15 

3.42 

2.99 

3.32 

2.51 

2.56 

2.42 

2.4S 

2.12 

2.14 

2.07 

2.09 

1.66 

1.73 

1.65 

1.70 


Table 5: Performance results for GIRL image with adaptive ARMA predictor, 
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Without RL 

With RL 

Without RL 

With RL 

Without 1; 1 

1. With RL 

Delta 

Encoder 

Encoder 

Encoder 

Encoder 

Encoder 

Encoder 

2 

4.71 

4.25 

3.94 

3.70 

3.03 

3.57 

4 

3.02 

2.86 

2.70 

2.67 

2.60 

2 59 

6 

2.33 

2.26 

2.14 

2.13 

2.09 

2.00 

6 

1.94 

1.90 

1.81 

1.81 

3.79 

3.79 

32 

1.49 

1.4S 

1.42 

1.42 

3.43 

1.41 


Table 6: Comparison of Entropy rates between systems with and without 
the RunJength Encoder for the COUPLE image. 



Mnure 4(a). GIRL image codec at entropy rate of 1.7 bpp. 



Moure 4(b). GIRL image coded at entropy rate of 1.5 bpp. 
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An Edge Preserving Differential Image Coding Scheme 


Abstract 

Differential encoding techniques are fast and easy to implement. However, a major problem 
with the use of differential encoding for images is the rapid edge degradation encountered 
when using such systems. This makes differential encoding techniques of limited utility 
especially when coding medical or scientific images, where edge preservation is of utmost 
importance. We present a simple, easy to implement differential image coding system with 
excellent edge preservation properties. The coding system can be used over variable rate 
channels which makes it especially attractive for use in the packet network environment. 


1. Introduction 


The transmission and storage of digital images requires an enormous expenditure of resources, 
necessitating the use of compression techniques. These techniques include relatively low complex- 
ity predictive techniques such as Adaptive Diflercntial Pulse Code Modulation (ADI CM) and its 
variations, as well as relatively higher complexity techniques such as transform coding and vector 
quant i/m' ion [1.2]. Most compression schemes were originally developed for speech and their appli- 
cation to images is at times problematic. This is especially true of the low complexity predictive 
techniques. A good example of this is the highly popular AD PCM scheme. Originally designed for 
speech [3]. it has been used with other sources with varying degrees of success. A major problem 


with its use in image coding is the rapid degradation in quality whenever an edge is encountered. 
Edges are perceptually very important and therefore their degradation can be perceptually very 
annoying. If the images under consideration contain medical or scientific data, the problem be- 
comes even more important, as edges provide position information which may be crucial to the 
viewer. This poor edge reconstruction quality has been a major factor in preventing ADPC-M from 
becoming as popular for image coding as it is for speech coding. While good edge reconstruction 
capability is an important requirement for image coding' schemes, another requirement that is gain- 
ing in importance with the proliferation of packet switched networks, is the ability to encode the 
image at different rates. In a packet switched network, the available channel capacity is not a fixed 
quantity, but rather fluctuates as a function of the load on the network. The compression scheme 
must therefore be capable of taking advantage of increased capacity when it becomes available while 
providing graceful degradation when the rate decreases to match decreased available capacity. 

In this paper we describe a DPCM based coding scheme which has the desired properties listed 
above. It is a low complexity scheme with excellent edge preservation in the reconstructed image. It 
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takes full advantage of the available channel capacity providing lossless compression when sufficient 
capacity is available, and very graceful degradation when a reduction in rate is required. 

2. Notation and Problem Formulation 

The DPCM system consists of two main blocks, the quantizer and the predictor (see Fig. 1). The 
predictor uses the correlation between samples of the waveform s(k) to predict the next sample 
value. This predicted value is removed from the waveform at the transmitter and reintroduced at 
the receiver. The prediction error is quantized to one of a finite number of values which is coded 
and transmitted to the receiver and is denoted by e q (k). The difference between the prediction 
error and the quantized prediction error is called the quantization error or the quantization noise. 
If the channel is error free, t lie reconstruction error at the receiver is simply the quantization erroi. 
To see this, note (Fig. 1) that the prediction error c(k) is given by 


1 

Vj 

II 

-id 

(1) 

re .<(k) is original signal predicted by p(k) which is given by 


P(k ) = “ •?’)> 

(2) 

s(/c) = e q (k) + p(k). 

(3) 

Assuming an additive noise model, the quantized prediction error 

e q (k) can be represented as 

e q (k) - e(k) + n q {k) 

(4) 


where n q (k) denotes the quantization noise. The quantized prediction error is coded and transmit- 
ted to the receiver. If the channel is noisy this is received as e q (k) which is given by 

e q (k) - e q (k) + n c (k) ( 5 ) 

where n c (k) represents the channel noise. The output of the receiver s(k) is thus given by 

s(k) = p(k) + e q (k), (6) 
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p(k) — p(fc) + n P {k) 


( 7 ) 


the additional term n p (k) being the result of the introduction of channel noise into the prediction 
process. Using (1), (4), (5), and (7) in (6) we obtain 


s(k) = s(k) + n q (k) + n c (k) + n p (k). 


( 8 ) 


If the channel is error free, the last two terqis in (8) drop out and the difference between the original 

and reconstructed signal is simply the quantization error. 

When the prediction error is small, it falls into one of the inner levels of the quantizer, and 
the quantization noise is of a type referred to as granular noise. If the prediction error falls in 
one of the outer levels of the quantizer, the incurred quantization error is called overload noise. 
Granular noise is generally smaller in magnitude than the overload noise and is bounded by the 
size of the quantization interval. The overload noise on the other hand is essentially unbounded 
and can become very large depending on the size of the prediction error. As edge pixels arc lather 


difficult to predic 
overload noise va 
current pixel, but 


:t, the corresponding prediction error is generally large, and this leads to large 
lues. Furthermore, because this error effects not only the reconstruction of the 
also future predictions, the prediction errors corresponding to the next few pixels 


also tend to be large, leading to an edge “smearing'’ effect. 

Reduction of the edge degradation can therefore be obtained by reducing or eliminating the 
slope overload noise. Reduction of the slope overload noise can be obtained by improving the 
prediction process. Gibson [4] analyzed AD PCM systems with backward adaptive prediction, and 
showed that the tracking ability of the adaptive predictor can be improved by the addition of zeros 
in the predictor. Motivated by these results, Sayood and Schekall [5] designed AD PCM systems 
for image coding with ARMA predictors. Their results show that some reduction in the edge 
degradation is possible with the use of adaptive zeros in the predictor. While the use of these 
predictors improves the edge reconstruction there is still significant degradation in the edges. One 
technique to further improve the edge performance was developed by Schekall and Sayood [6], which 
uses the Jayant quantizer as an edge detector. The overload noise is then reduced by sending a 
quantized representation of the noise through a side channel. The advantage of this approach is 
that it can be added to existing ADPCM systems. The disadvantage is that the use of a side 
channel introduces synchronization problems. In this paper we propose a different approach for 
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edge preservation which does not require a side channel. This approach is described in the following 
section. 

3, Proposed Approach 

The approach taken in this paper is a variation on the standard rate-distortion tradeoff. The basic 
idea is that the slope overload noise can be reduced by increasing the rate. However rather than 
increasing the rate for encoding each and every pixel, there is only an instantaneous rate increase 
whenever slope overload is encountered. The way this is implemented is outlined in the block 
diagram of Fig. 2. A DPCM system is followed by a lossless encoder at the transmitter. At the 
receiver the inverse operations are performed. The DPCM system differs from standard DPCM 
systems in that the quantizer being used effectively has an unlimited number of levels. In practice 
what this means is that if the input has 25G levels, which is standard for monochioine images, then 
the DPCM quantizer will have 512 levels. This effectively eliminates the overload noise making 
the distortion a function of the quantizer stepsize A. Of course by itself it also eliminates any 
compression that may have boon desired, in fact it requires an increase of one bit in the rate. 
The compression is obtained by use of the lossless encoder. The lossless encoder output alphabet 
consists of N codewords. These codewords correspond to N consecutive levels in the quantizer. 
Let the smallest level be labeled ?: L and the largest level be labeled xjj. If the quantizer output 
Cq(k) is a level between x /, and x//, then the lossless encoder puts out the corresponding channel 
symbol. If, however, c q (L) is greater than xj{, the encoder puts out n symbols coi responding to 
Xu and a symbol corresponding to e qn (k) where 

n = |e,(fc)/x,J and c qn (k) = e q (k) (mod x H ) 

A similar strategy is followed when c q (k) < x/> Thus the instantaneous rate is increased by a 
function of n whenever the prediction error falls outside the closed interval [x£,x//]. 

One of the consequences of this type of encoding is that it can generate runs of xl and xj{ 
whenever the image contains a large number of edges. Fortunately the encoding scheme also 
provides a significant number of special symbols (x// followed by a symbol for a negative level and 
x L followed by a symbol for a positive level) that can be used to encode these runs. When the size 
of the lossless encoder output is increased, the number of special symbols available also increases 
and the coding of the runs can be performed more efficiently. 

These special sequences can also be used to signal a change of rate for applications in which 
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the available channel capacity changes with time. The actual change can be accommodated by 
changing the stepsize and reducing the lossless encoder codebook size by the same amount. Several 
of the systems proposed above were simulated. The results of these simulations are presented in 
the next section. 


4. Results 


Before we provide the results using images, let us examine the performance of the scheme when 
applied to a one-dimensional signal containing a simulated edge. This signal was first encoded using 
a five-level quantizer. The results arc shown in Fig. 4(a). As can be seen, it takes a little while for 
the DPCM system to catch up. In an image this would cause a smearing of the edge. When the 
proposed system with the same parameters is used there is no such effect, as is clear from Tig. 4(b). 
The quantizer in tills case went into the recursive mode twice, once at the leading and once at the 
trailing edge. To get an equivalent effect, a standard DPCM system would have to have a fort) -lev ol 
quantizer. To show that this performance is maintained when the system is used with 21) images, 
two svstems of the type described in the previous section have been simulated. Both systems use the 
following two-dimensional fixed predictor [7]: p(k) = ‘2/d s(k — 1) -f '2/3 - 256) - 1/3 jf(k - 25 1 ). 


One of the systems contains the lossless encoder followed by a runlength encoder while the other 
contains only the lossless encoder without the runlength encoder. The test images used were the 
USC GIRL image, and the USC COUPLE image. Both arc 256 X 256 monochrome eight bit images 
and have been used often as test images. The objective performance measure were the Peak Signal 
to Noise Ratio (PSNR) and the Mean Absolute Error (MAE) which are defined as follows: 


PSNR = 10 log 10 


255 2 

<Rfc))- w> 


MAE =<| s{k) - s(k)\> 


where < • > denotes the average value. 

Several initial test runs were performed using different number of levels, different values of xi 
and different values of A to get a feel for the optimum values of the various parameters (Given 
x L and A, xu is automatically determined.). We found that an appropriate way of selecting the 
value of xi was using the relationship 
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where [x\ is the largest integer less than or equal to and N is the size of the alphabet of the 
lossless coder. This provides a symmetric codebook when the alphabet size is odd, and a codebook 
skewed to the positive side when the alphabet size is even. The zero value is always in the codebook. 

As the alphabet size is usually not a power of two, the binary code for the output alphabet will 
be a variable length code. The use of variable length codes always bring up issues of robustness 
with respect to changing input statistics. With this in mind, the rate was calculated in two different 
ways. The first w r as to find the output entropy, and scale it up by the ratio of symbols transmitted 
to the number of pixels encoded. We call this rate the entropy rate, which is the minimum rate 
obtainable if we assume the output of the lossless encoder to be memoryless. While this assumption 
is not necessarily true, the entropy rate gives us an idea about the best we can do with a particular 
system. We also calculated the rate using a predetermined variable length code. This code was 
designed with no prior knowledge of the probabilities of the different letters. The only assumption 
was that the letters representing the inner levels of the quantizer were always more likolj- than 
the letters representing the outer levels of the quantizer, lhe code tree used is shown in fig. 3. 
Obviously, this will become highly inefficient in the case of small alphabet size and small A, as in 
this case, the outer levels xl and z// will occur quite frequently. This rate can be viewed as an 
upper bound on the achievable rate. 

The results for the system without the runlength encoder are shown in Tables 1 and 2. Tabic 1 
contains the results for the COUPLE image, while Table 2 contains the results for the GIRL image. 
In the table Rj, denotes the entropy rate while Ru is the rate obtained using the Huffman code 
of Fig. 3. Recall that for image compression schemes, systems with PSNR values of greater than 
35 clB are perceptually almost identical. As can be seen from the PbNR values in the tables there 
is very little degradation wdth rate, and in fact if we use the 35 dB criterion there is almost no 
degradation in image quality until the rate drops below' tw F o bits per pixel. This can be verified by 
the reconstructed images showm in Fig. 5. Each picture in Fig. 5 consists of the original image, the 
reconstructed image and the error image magnified 10 fold. In each of the pictures, it is extremely 
difficult to tell the source or original image from the reconstructed or output image. This subjective 
observation is supported by the error images in each case which are uniform in texture throughout 
without the edge artifacts which can be usually seen in the error images for most compression 
schemes. 

We can see from the results that if the value of A and hence x L is fixed, the size of the codebook 
has no effect on the performance measures. This is because the only effect of reducing the codebook 
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size under these conditions is to increase the number of symbols transmitted. While this has the 
effect of increasing the rate, because of the way the system is constructed it does not influence the 
resulting distortion. The drop in rate for the same distortion as the alphabet size increases can be 
clearly seen from the results in Tables 1 and 2. 

Table 3 shows the decrease in rate when a simple runlength coder is used. The runlength coder 


encodes long strings of x L and xh using the special sequences mentioned previously. As can be seen 


from the results the improvement provided by the current runlength encoding scheme is significant 
only for small alphabets and small values of A. This is because it is under these conditions that 
most of the long strings of x L and x H are generated. However we are not as yet using many of the 
special sequences in the larger alphabet codebooks, so there is certainly room for improvement. 

Finallv to show the effect of changing rate on the perceptual quality, the USC GIRL image was 


encoded using three different rates. The top quarter of the image was encoded using a codebook 
size of eight and a A of two resulting in a rate of 4.37. The second quarter of the image was 
encoded using a codebook of size five and a A of 4 resulting in a rate of 2.86. The bottom half of 
the image was encoded using a codebook size of three and A of eight resulting in a rate of 2.36. 
The original and reconstructed images are shown in Fig. 6. The fact that the image is coded with 
three different rates can only be noticed if the viewer is already aware of this fact and then or.l> 
after vow close scrutiny. The fact that the image was encoded using three different rates is clear 
though in the magnified error image shown in Fig. 7. This property of the coding scheme would be 
extremely useful if changes in the transmission bandwidth forced the coder to operate at different 
rat os. 


To seo how this algorithm performs on a relative scale, we compare it to the differential scheme 
proposed by Maragos, Shafer and Mersereau [8]. The system proposed by Maragos et. al. uses a 
forward adaptive two dimensional predictor and a backward adaptive quantizer. The coefficients 
are obtained over a 32x32 or a 16x16 frame and transmitted as side information. The proposed 
system (we feel) is considerably simpler, because of the lack of any need for adaptation and side 
information; however, the results compare favorably with the system of [8]. Comparative results 
are shown in Table 5. The results were obtained by varying the stepsize A until the rate obtained 
was similar to the rate in [8], and then comparing the PSNR. As in [8], to obtain rates below one 
bits/pixel several coder outputs were concatenated into blocks which were then Huffman encoded. 
For the results shown in Table 5, we used a block size of three. Given a five-level recursive quantizer, 
this corresponds to an alphabet size of 125, which would be somewhat excessive for a simple 
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implementation. (In [8] block sizes of four to eight are used with two- and three-level quantizers.) 

The above comparison is not meant to indicate that the two systems being compared are 
exclusive. A case can be made for combining the good features of both systems. For example, 
the prediction scheme described in [8] could be combined with the quantization scheme described 
here. However, it was felt in this particular case that the advantages to be gained by the addition 
of a forward adaptive predictor were offset by the increase in complexity and synchronization 
requirements. 

5. Conclusion 

We have demonstrated a simple image coding scheme which is very easy to implement in realtime 
and has excellent edge preservation properties over a wide range of rates. 

This system would be especially useful in transmitting images over channels where the available 
bandwidth may be vary. The edge preserving quality is especially useful in the encoding of scientific 
and medical images. 
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Delta 

MAE 

PSNR 

Size 

Rl 

= 3 

Ru 

Size 

Rl 

= 5 
Ru 

Size 

Rl 

II 

2 

0.50 

51.12 

4.45 

4.89 

3.88 

4.79 

3.66 

4.79 

4 

0.96 

46.56 

2.82 

2.85 

2.64 

2.77 

2.58 

2.76 

8 

1.86 

41.17 

1.78 

1.85 

1.74 

1.80 

1.73 

1.79 

16 

3.54 

35.63 

1.06 

1.37 

1.05 

1.35 

1.05 

1.34 

32 

6.82 

30.04 

0.56 

1.14 

0.55 

1.13 

0.55 

1.13 


Table 1: Performance results for the COUPLE image, alphabet size 3, 5 and 8. 


Delta 

MAE 

PSNR 

, Size 
Rl 

= 3 
Ru 

Size 

Rl 

= 5 
Ru 

Size 

Rl 

= 8 
Ru 

2 

0.49 

51.17 

5.01 

D. { 8 

4.26 

5.66 

4.01 

5.61 

4 

0.99 

46.37 

3.17 

3.28 

2.93 

3.17 

2.S6 

3.14 

8 

1.98 

40.79 

2.02 

2.05 

1.96 

1.99 

1.95 

1.97 

16 

3.74 

35.24 

1.21 

1.44 

1.19 

1.42 

1.19 

1.41 

32 

6.90 

29. SO 

0.63 

1.16 

0.63 

1.16 

0.63 

1.16 


Table 2: Performance results for the GIRL image, alphabet size 3, 5 and S. 
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Delta 

Size 

Without 
RL Encoder 

= 3 

With 

RL Encoder 

Size 

Without 
RL Encoder 

= 5 

With 

RL Encoder 

Size 

Without 
RL Encoder 

= 8 

With 

RL Encoder 

2 

5.01 

4.58 

4.26 

4.06 

4.01 

3.95 

4 

3.17 

3.05 

2.93 

2.90 

2.86 

2.86 

8 

2.02 

2.01 

1.96 

1.96 

1.95 

1.95 

16 

1.21 

1.21 

1.19 

1.19 

1.19 

1.19 

32 

0.63 

0.63 

0.63 

0.63 

0.63 

0.63 


Table 3: Comparison of Entropy rates between systems with Runlength 
(RL) Encoder and without RL Encoder for GIRL image. 


Delta 

Size 

Without 
RL Encoder 

= 3 

With 

RL Encoder 

Size 

Without 
RL Encoder 

= 5 

With 

RL Encoder 

Size 

Without 
RL Encoder 

= 8 

With 

RL Encoder 

■> 

4.45 

4.05 

3.SS 

3.66 

3.66 

3.61 

4 

2.82 

2.69 

2.64 

2.61 

2.58 

2.58 

8 

1.78 

1.75 

1.74 

1.73 

1.73 

1.73 

16 

1.06 

1.06 

1.05 

1.05 

1.05 

1 .05 

32 

0.56 

0.55 

0.55 

0.55 

0.55 

0.55 


Table 4: Comparison of Entropy rates between systems with and without 
the Runlength Encoder for the COUPLE image. 


R' 

(Frame s: 

;sults from [8] 
ize=32, 3 level AQB) 

Results from proposed system 
(alphabet size 5) 

Rate 

PSNR 

Rate 

PSNR 

0.74 

30.3 

0.74 

31.13 

0.83 

31.6 

0.84 

32.1 

0.93 

32.6 

0.94 

33.1 

1.03 

33.4 

1.03 

33.9 


Table 5: Comparison of proposed system with that of [8]. 
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Figure 5(b) GIRL Image Coded at entropy rate of 1.3 bpp 
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A ROBUST COMPRESSION SYSTEM FOR LOW BIT RATE 
TELEMETRY - TEST RESULTS WITH LUNAR DATA 

Khalid Sayood and Martin C. Rost 
Department of Electrical Engineering 
University of Nebraska 


PROBLEM STATEMENT 

The output of a Gamma Ray detector is quantized using a 14 bit A/D 
converter. The number of each of the 2 ™ or 16,384 levels occurring 
in a 30 second interval is counted. In effect, a histogram of the 
gamma ray events is obtained with 16,384 bins. The contents of these 
bins are to be encoded without distortion and transmitted at a rate 
less than or equal to 600 bits per second. Thus the contents of the 
16,384 bins are to encoded using 18000 bits. The encoder should be 
simple to implement and require only a minimal amount of buffering. 

PROPOSED SYSTEM 


Encoder 

The contents of the bins are treated as a sequence for purposes of 
encoding. The proposed system encoder can be divided into two stages 
(three if a Huffman coding option is used. See Figure 1.) The first 
stage is a leaky differencer whose input/output relationship is given 

by 


z n — Xj-^ [ ax n -i -1 


where [t] is the largest integer less than or equal to t. The reason 
for using a leaky differencer is to allow the effect of errors to die 


out with time. 
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The output of the differenoer forms the input for the second stage 
which is a modified runlength encoder. The encoder codebook contains 
six different types of symbols. 

Mn - symbol used to represent negative differenoer 
output values, for example, the differenoer 
output values -1, are represented 

by the symbols Ml, M2,...,Mn, respectively. 

Pn - symbols used to represent positive differenoer values, 
they are coded similar to the Mn symbols. Thus a 
differenoer output value of +3 would be represented by the 

symbol P3 . 

Zn _ symbols used to represent string of zeros of 
length n. Since the number of Z-symbols is 
kept small, these symbols represent "short" 
string of zeros (0-strings), while the SO- and 
Sl-symbols to be introduced later represent 
"long" O-strings. 

BR - In the encoding scheme that follows, there 
will sometimes be a need to specify the end of 
a seguence. The BR or break symbol is used 

for this purpose. 


SOXX - symbol used to represent long O-stnngs. The 
SO symbol indicates that a O-string is being 
represented while X stands for a four bit 
word. XX is thus an eight bit word specifying 
the length of the O-string. 

S1XX - symbol used to represent long O-stnngs that 
are followed by a 1. It is constructed in the 
same manner as the SOXX symbol. 



Each symbol , Mn, Pn. Zn, BR, SO, and SI VJooK 7 

word. T he number of symbols m the the 

o(M)+o(P) + o(Z)+3 where o(M) , o(P), an ( ^ bols , and short 

number of negative source symbols, positive repres ented 

O-strings symbols to be channel coded. “ In OU r 

by 4 bits, a total of sixteen encoder symbols are possi 

coding scheme, .(H) is set to 2, .(» to 6, and o(P) 

• -l o 1 2 3,4,5 or 

t-hat if the differential output is -1, _ f > ' 

This means that if the a , c .. can be represented by a 

a string of zeros of length five or less, it c ve value 

single symbol. what if the differentia output l P ^ ^ 

larger than five or a negative value less than • concatenatlon 

Since o(P) is 5, the ^^"^^“LTconcatenation symbol, 

single symbol is 5. . e 18 can be coded as 

larger source values can be coded. In this , ^ 

P5 P5 P5 P3. The receiver accumulates * ^ This syBbol 

consecutively received until a non-P5 symbol - jeceiv „ 

is used to complete the current source value. In 

indicates the source value is 18. 

, • „ ™,innie of the maximum P- 

in the case where the source value is a mult p process . 

symbol value some confusion can °® our “ * ^ ^ In this 

Consider the coding of the source va ues these values but, the 

ease, four «« symboU « problem the breaK 

receiver decodes them , . value but, is used 

symbol (BR) is used. ^ accumulation of P-symbols. 

by the receiver to prema Y gR p5 p3 # The receiver 

Specifically, 10 and 8 are coded as encountered 

stops constructing the first source value when the BR is en 

P , Hnfl t he next with the following P5 symbol, 

and start constructing the next 
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If a source value to coded is negative, the above procedure is used 
with the allowed M-symbols along with the BR symbol to prevent 
incorrect receiver decoding. For example, -3 would be encoded as M2M1 
and -4 would be encoded as M2M2BR. 

In this particular application, the tails of a given signal frame 
contain long runs of zeros that are separated by non-zero data 
values. it is very likely that these O-string separators take the 
value 1. Thus, it is beneficial to code these runs with one of the 
following two symbols, each of which is three code words in length: 

SO x y a O-string of length xy (base 16) . 


SI x y a O-string of length xy (base 16) followed a 1. 


For example, the symbol, so 4 0, represents a string of 64 Os, and the 

symbol, SI 4 0, represents a string of 64 Os followed by a 1. If the 

separating data value is not 1, then additional source symbols follow 

the SO symbol to complete the description of its value. The maximum 

length of O-string that can be coded with this type symbol is 255 <FF 

'"' S ' lw " lf a strl ng of length greater than 255 is encountered, a 
concatenation rule must be applied. 


Since the symbols SO 0 0 and SI 0 0 are not assigned, they are used as 

0 string concatenation symbols. They are used to indicate the fact 

that a O-string is to be built whose length is greater than 255. Each 

time one of these symbols is used it is assumed that a o-string of 

length greater than 255 is being coded, and additional information is 

to be provided on its length by the following symbols. A O-string is 

terminated if the last SO-symbol indicates a length value other than 
00 for xy. 


For example, if a O-string of length 300 is followed by a 1, two 
source symbols (six channel words) are required to code the string: SI 
0 0 SI 2 D. The value for xy of the first symbol is 00, so the 0- 
stnng is continued using the following Si-symbol (s) . In this way o- 
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strings of arbitrary length can be constructed by concatenating as 
many SI 0 0 symbols as needed to bring the overall « CO " St ^ 
string length to within 235 Os of its fuU length. J- 

symbol in such a series which oes ± the SI symbol is 

terminates the O-string concatenation process. Since th £ 

being used this O-string is automatically followed by a 1 C 
coding a O-string of length 300 that is followed by a -1 Two SO 
symbols (six channel words) are reguired to code the 0 string, # ^ 
one M-symbol (one channel word) is required to code 
2 D Ml for a total of seven channel words. 

Since the long runlength symbols require three channel words «oh, an 
excessive amount of channel capacity can be wasted when co ing 
runs of os. As a consequence, a group of short run symbols that 
only one channel word each are used to alleviate this Pr°bl“”- ^ 
identifier for these symbols is Zn (where " by the 

the O-string). For example, a run of 5 

-bol ». coding ^ ^ 

Til ZirThannelTits when using the Z-symbols instead of the S0- 
and Sl-symbols. 

consider the following example for coding a string of 10 Os. Since 

o<" Ts 6 to code this O-string using Z-symbols tabes two channel 

words- Z6 Zt. But, when coded using an SO-symbol it takes three 
words. « Therefore, the Z-symbol 

channel words to code this 0-stnng: SO 0 A Ther , ^ 

coding is more channel efficient. Since an 

require three channel words, the only way to guarantee that sho 
strings are coded efficiently is to set the maximum number o s or 
symbols in a single O-string coding to two. Thus, for an o«Z, of 6. 
the maximum O-string length to be Z-symbol coded is 12. 

statement . No ? claims are made regarding its suitability 
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tasks. The second characteristic is its simplicity. The encoding 
operation requires a very small amount of computation. Furthermore, 
the onboard memory requirements for buffering are minimal. 

If Huffman coding is to be used, the final stage of the encoder is a 
Huffman coder. This will, of course, increase the complexity of the 
encoder and may make the system more vulnerable to channel errors. 
Therefore, if at all possible we will avoid using a Huffman coder. 

Decoder 


The decoder for the proposed system consists of three stages. The 
frrst stage of proposed system decoder is maximum A Priori Probability 
(HAP) receiver The HAP receiver design is based on the assumption 
that the output of the encoder contains dependencies. 


The HAP design criterion can be formally stated as follows: For a 

discrete memoryless channel (DMC) , let the channel input alphabet be 

denoted by A - (ao, a^, . . . , a^-i) , and the channel input and output 
sequences by y = (y 0 ,y a y L . l( and , = <y 0 ,y a y L . l)( 


respectively . 


If A = {Ai } 


is 


the set of sequences 


( a i , 0/«i , l , • • • ,ai ' l-i } , ai f ^eA, then the optimum receiver (in the sense 

of maximizing the probability of making a correct decision) maximizes 
P[C], where 


P[C] = X P[C|?]P[Y] . 

This m turn implies that the optimum receiver maximizes P[C|Y]. when 
the receiver selects the output to be A k , then P[c|J] = P[ y = A k |J). 
Thus, the optimum receiver selects the sequence A k such that 


P[Y = A k |Y] > P [ Y = AjjY] Vj^. 

When the channel input sequence is independent, this simplifies to the 
standard MAP receiver' >. under conditions where this is not true, 
the receiver becomes a sequence estimator which maximizes the path 
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metric. SlogPfyi | yi,yi-l> <5) ■ The path metric can be computed for a 

particular system by rewriting it using the following relationship 


P[Yi = a j | Yi = a n/Yi-l ~ a m] 


P[Yi = a nl Yi = a j] p tyi ~ a jlyi-l 
2i P[Yi = ailYi-1 = a ml p [Yi = a nlYi = a lJ 


Notice that the right hand side consists of two sets of conditional 

probabilities { P[Yi | Yil > and <P[Yil Yi-ll > • The first set ° f 

conditional probabilities are- the channel transition probabilities 
while the second depend only on the encoder output. The two are 
combined according to the above relationship to construct an M X M X M 
lookup table for use in decoding. The structure of the MAP receiver 
is that of the Viterbi decoder ( 4 ' 5 ) . 

The second stage of the decoder is the inverse operation of the 
modified run-length encoder. The operation of this stage has already 
been described in the previous section. The final stage of the 

decoder is the inverse of the differential operation with an input 

output relationship 

x n = z n + [ax n _i] 


RESULTS 

In this section we present results obtained by using the proposed 
system of the previous section. The data used was provided by Ms.\ 
M. Mingarelli-Armbruster of the Goddard Space Flight Center. This 
data was generated according to a Poisson distribution where the 
Poisson parameter was obtained from ten hours of lunar data. Both 
noisy and noiseless channel performance of the proposed system were 
examined via Monte-Carlo simulation. A total of twenty, 30-second 
intervals were used in the tests. The performance was compared with 

the Rice algorithm ( 1-3 ) . 

Before proceeding with the results, some caveats are in order. First, 
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the name Rice algorithm is a misnomer. What is presented ( 1-3 ) is not 
an algorithm but an approach. In this approach, a suite of algorithms 
is used to encode sections of the data, and the most efficient 
algorithm for that particular section of data is selected. In this 
way, data with very different statistical profiles can be 
accommodated. Thus what is presented^ - could more correctly be 
called the Rice Universal Coding Approach (RUCA) . What we compare 
against here are algorithms presented^ - as examples of the RUCA. 
These algorithms were constructed for use in very general situations. 
As opposed to this, the particular algorithm presented here has been 
designed for a specific task. A final observation is that the encoder 
presented in this paper could very easily be used as the first stage 
of the RUCA. However, this would result in a rather complex encoder 
and substantial increase in the need for onboard memory over the 
proposed design. Therefore, if the algorithm presented in the 
previous section satisfies the requirements in terms of rate and 
robustness, such a step would be undesirable. 

The results of the tests with both algorithms are presented in Table 1 
and Table 2. The number of bits required to code twenty thirty-second 
intervals and the average rate needed for both algorithms is presented 
in Table 1. The second and third columns contain the total number of 
bits and the rate when the Rice algorithm is used. The average rate 
over twenty intervals is 719 bits per second. Columns three to six 
present the results obtained by using the proposed algorithm. The 
first two columns contain the results for the case where the Huffman 
coder was not used while the last two columns contain the results for 
when the Huffman coder formed the last stage of the encoder. The rate 
without the Huffman coder averaged over twenty intervals is 595 bits 
per second while the average rate when the Huffman coder is used is 
522 bits per second. These results indicate that the proposed system 
wi-ii satisfy the specifications (coding rate below 600 bits per 
second) both when the Huffman coder is used and when it is not. As 
both systems meet the target and as the inclusion of the Huffman coder 
increases both the complexity and the vulnerability of the system to 
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channel noise, we elected to use the system without the Huffman coder. 

Table 2 provides the performance of the algorithms under noisy channel 
conditions. Three performance measures are used, namely, mean square 
error (HSE) , mean absolute error (MAE) , and the number of decode 
values which are in error. Note the very large difference between the 
performance of the Rice algorithm and the proposed algorithm. A so 
the proposed algorithm maintains a robust performance at extreme y 
high error rates. In fact, under even highly adverse conditions the 
mean squared error is almost constant, and the number of erroneous 
decoded values is about 25% of the total. However, the performance 
of the algorithms at high error rates may be irrelevan 1 
particular situation. The reason being that the transmitted data will 
be well protected by a channel coding scheme consisting of a Reed- 
solomon coder followed by a convolutional coder. This combination is 
expected to beep the average probability of error on the co e c 

below 9 X 10" 6 . 

Finally, we examine the relative complexity and buffer requirements 
for the two algorithms. The proposed algorithm can be easily realize 
with a simple program implemented using a microprocessor. Eased on 
the memory requirements for the simulation program used in this s u^y, 
the memory needed for actual implementation should e a ou ' 

only time buffering may be required is when a large dlfferencer ou pu 
is encountered, and the encoder has to generate several channel 
symbols for one input. Depending on the way the entire sys em 

implemented, the buffer requirements could range from a single symb 
buffer to perhaps a sixteen symbol buffer. 

As opposed to this, the Rice algorithm by its very nature being a 
universal coding algorithm, is quit, complex. Each bloc* °f data is 
encoded using a number of candidate algorithms; the algorithm which 
provides the most efficient encoding is then selected. Each of the 
candidate algorithms is itself relatively complex though some very 
ingenious techniques are used to mahe subunits of one algorithm common 
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to several candidate algorithms. Because several passes are required 
to do the encoding, the buffering requirements for this approach are 
substantial. 

These differences in complexity are very natural based on the 
different objectives of the two algorithms. The proposed system is 
designed for a very specific situation while the Rice algorithm is 
designed to handle general situations. 
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TABLE 1 


Coding 

denotes 


rates with the Rice Algorithm and the Proposed Algorithm, 
the results for the case where the Huffman Coder was used 


(HC) 


RICE ALGORITHM 
INTERVAL TOTAL RATE 
BITS 


TOTAL 

BITS 


PROPOSED ALGORITHM 
RATE TOTAL BITS RATE 
(HC) (HC) 


1 

21,647 

721.6 

2 

21,385 

712.8 

3 

21,530 

717.7 

4 

21 , 562 

718.7 

5 

21,666 

722 . 2 

6 

21,424 

714.1 

7 

21,841 

728 . 0 

8 

21,630 

721.0 

9 

21,719 

723.9 

10 

21,568 

718.9 

11 

21,308 

710. 3 

12 

21,509 

716.9 

13 

21.633 

721.1 

14 

21,822 

727 . 4 

15 

21,296 

709 . 8 

16 

21,701 

723.4 

17 

21,058 

701.9 

18 

21,312 

710.4 

19 

21,713 

723 . 8 

20 

21,888 

729 . 6 


17,832 

594.4 

17 . 528 

584.3 

17,784 

592 . 8 

17 , 840 

594.7 

18,144 

604.8 

17 , 504 

583 . 5 

18 , 048 

601.6 

18 , 096 

603.2 

18 , 132 

604 . 4 

18 , 096 

603 . 2 

17 , 604 

586.8 

17,728 

590.9 

17,780 

592.7 

18,016 

600.5 

17 , 564 

585.4 

17,956 

598.5 

17,296 

576.5 

17 , 688 

589.6 

18,160 

605.3 

18 , 292 

609.7 


15,733 

524.4 

15,345 

511.5 

15,520 

517 . 3 

15,691 

523 . 0 

15,883 

529 . 4 

15,457 

515 . 2 

15,882 

529 . 4 

15 ,907 

530 . 2 

15 , 843 

528 . 1 

15,695 

523.2 

15,438 

514 . 6 

15,580 

519 . 3 

15,581 

519 . 4 

15,913 

530.4 

15,361 

512 . 0 

15,872 

529 . 1 

15,139 

504 . 6 

15,449 

514.9 

16,033 

534 . 4 

16,125 

537 . 5 


OVERALL 

AVERAGE 


595.1 


522 . 4 
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TABLE 2 


Performance of the algorithms under noisy channel conditions. 

RICE ALGORITHM 



MEAN 

MEAN 

# OF 

PROBABILITY 

SQUARED 

ABSOLUTE 

DECODED 

OF ERROR 

ERROR 

ERROR 

ERRORS 

lO -6 

0 . 0760 

0.023 

140 

id -5 

4 . 07 

0.45 

1,908 

1 

O 

H 

31.49 

3 . 14 

10,177 

10“ 3 

479.22 

16.03 

15,658 

10 -2 

8 , 562 . 87 

76.75 

16, 189 


PROPOSED ALGORITHM 


MEAN 

MEAN 

# OF 

PROBABILITY' 

SQUARED 

ABSOLUTE 

DECODED 

OF ERROR 

ERROR 

ERROR 

ERROR 

10 -6 

2.4 X 10" 5 

1.2 X 10~ 5 

1 

10 -5 

0 . 026 

0. 016 

218 

10~ 4 

0 . 17 

0.14 

1,287 

10~ 3 

0.78 

0.28 

2,944 

10 -2 

6 .81 

0.71 

3,765 

S EM. MARY AND CONCLUSIONS 



We have presented 

a robust noiseless 

encoding scheme 

for encoding 


gamma ray spectroscopy data. The encoding algorithm is simple to 
implement and has minimal buffering requirements. The decoder 
contains error correcting capability in the form of a MAP receiver. 
While the MAP receiver adds some complexity, this is limited to the 
decoder. Nothing additional is needed at the encoder side for its 
functioning . 
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ic distortions present in the coded image. But a casual viewer who spent a few seconds of time in 
Tutini/ing the coded image did not see the distortions. Given that edges arc of such great importance, u 
i:iVes sense that regions containing edges be quantized with higher fidelity. However, to do so generally 
vuircs a larger codelook, which is not feasible witli an LBG VQ 
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A Robust Coding Scheme for Packet Video 

Y.C. Chen, K. Sayood and D. J. Nelson , / 
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Abstract 

v/e present a layered packet video coding algorithm based on a progressive 
transmission scheme. The algorithm provides good compicssion and can hamL>_ 
significant packet loss with graceful degradation in tire reconstruction sequence. 
Simulation results for various conditions are presented. 


I. INTRODUCTION 

Due to the rapid evolution in the fields of image processing and networking, video miormation 
will be an important part of tomorrow’s telecommunication system. Up to now, video transmission 
lias been mainly transported over circuit-switched networks. It is quite likely that packet-switched 
networks will dominate the communications world in the near future. Asynchronous transfer mode 
(ATM) techniques in broadband-ISDN can provide a flexible, independent and high performance 
environment for video communication. Therefore, it is necessary to develop techniques for ridco 
transmission over such networks. 

The classic approach in circuit switching is to provide a "dedicated path," thus reserving 
a continuous bandwidth capacity in advance. Any unused bandwidth capacity on the allocated 
circuit is therefore wasted. Rapidly varying signals, like video signals, require too much 
bandwidth to be accommodated by a standard circuit-switching channel. With a certain amount 
of capacity assigned to a given source, if the output rate of that source is larger than the channel 
capacity, quality will be degraded. If the generating rate is less than the available capacity, 
the excess channel capacity is wasted. The use of packet networks allows for tire utilization 
of channel sharing protocols between independent sources and can improve channel utilization. 
Another point that strongly favors packet-switched networks is the possibility that the Integration 
of services in a network will be facilitated if all of the signals are separated into packets with 
lire same format. 

Some coding schemes which support packet video have been explored. Verbicst and Pinnoo 
proposed a DPCM-based system which is comprised of an intrafield/interframe predictor, a 
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nonlinear quantizer, and a variable length coder[l], Their codec obtains stable picture quality 
by switching between three different coding modes: inlraficld DPCM, interframe DPCM, and no 
replenishment. Ghanbari has simulated a two-layer conditional replenishment codec with a first 
layer based on hybrid DCT-DPCM and second layer using DPCM[2]. This scheme generates 
two type of packets: "guaranteed packets" contain vital information and "enhancement packets" 
contain "add-on" information. Darragh and Baker presented a sub-band codec which attains 
a user-prescribed fidelity by allowing the encoder’s compression rate to vary[3]. The codec’s 
design is based on an algorithm that allocates distortion among tire sub-bands to minimize 
channel entropy. Kishino ct al. describe a layered coding technique using discrete cosine 
transfonn coding, which is suitable for packet loss compensation^]. Karlsson and Vcltcrh 
presented a sub-band coder using DPCM with a nonunifomi quantizer followed by run-length 
coding for baseband and PCM with run-length coding for ncnbascband[5]. In this paper, a 
different coding scheme based on a progressive transmission scheme called Mixture Block 
Coding with Progressive Transmission (MBCPT) [6,7] is investigated. Unlike those methods 
mentioned above, MBCPT doesn't use decimation and interpolation filters to separate the signals 
into sub-bands. However, it docs have tire attractive property of dealing separately with high 
frequency and low frequency information. This separation is obtained by the use of variable 
blccksi/.c transform coding. 

This paper is organized as follows. First, some of the important characteristics and 
requirements of packet video are discussed. In Section 3, die coding scheme called Mixture 
Block Coding with Progressive Transmission (MBCPT) is presented. In Section 4, a nctwoik 
simulator used in testing the scheme is introduced. In Section 5, the simulation results are 
discussed. Finally, in Section 6 the paper is summarized. 


II. CHARACTERISTICS OF PACKET VIDEO 

The demand for various services, such as telemetry, terminal and computer connections, voice 
communications, and full-motion high-resolution video, along with the wide range of bit rates and 
holding times they represent, provides an impetus for building a Broadband Integrated Service 
DigitaT Network (B-ISDN). B-ISDN is a projected worldwide public telecommunications network 
tha°t will service a wide range of user needs. The continuing advances in the technology of optical 
fiber transmission and integrated circuit fabrication have been driving forces to realize B-ISDN. 
The idea of B-ISDN is to build a complete end-to-end switched digital telecommunication network 
with broadband channels. Still to be precisely defined by CCITT, with fiber transmission, H4 
has an access rate of about 135 Mbps. 

Packet-switched networks have the unique characteristics of dynamic bandwidth allocation 
for transmission and switching resources, and the elimination of channel structure. They 
acquire and release bandwidth as needed. Because the video signals vary gready in bandwidth 
requirement, it is attractive to utilize a packet-switched network for video coded signals. Allowing 
the transmission rate to vary, video coding based on packet transmission permits the possibility 
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of keeping the picture quality constant, by implementing "bandwidth on demand". There arc 
three main merits when transmitting video packets over a packet-switched network: 

1. Improved and consistent image quality: if video signals are transmitted over fixed-rate 
circuits, there is a need to keep the coded bit rate constant, resulting in image degradation 
accompanying rapid motion. 

2. Multimedia integration: as mentioned above, integrated broadband services can be 
provided using unified protocols. 

3. Improved transmission efficiency: using variable bit-rate coding and channel sharing 
among multiple video sources, Scenes can be transmitted without distortion if other 
sources, at the same time, are without rapid motion. 

However video transmission over packet networks also has tire following drawbacks: 


1. The time taken to transmit a packet of data may change from time to time. 

2. Packets may be delayed to tire point where, because of constraints due to tire Human 
Visual System, they have to be discarded. 

3. Headers of packets may be changed because of errors and delivered to the wrong receiver. 

It has to be emphasized that the dclay/lost effect can reach very- high levels if the combined 
users’ requirement exceeds the acquirable bandwidth and may seriously damage lire quality of 
the image. 

When lire signals transmitted in the network are nonstationary and circuit-switching is used 
wi;h limited bandwidth, a buffer between the coder and tire channel is needed to smooth out 
the varying rate. If the amount of data in the buffer exceeds a certain threshold, the encoder is 
instructed "to switch into a coding mode that has lower rate but worse quality to avoid buffer 
ovcrilow. In packet-switched networks, Asynchronous Time Division Multiplexing (ATDM) can 
efficiently absorb temporal variations of lire bit-rate ol individual sources by smoothing out tire 
aggregate of several independent streams In the common network buffcrs[S], 


To deliver packets in a limited time and provide a real time scivice is a difficult resource 
allocation and control problem, especially when the source generates a high and greatly varying 
rate. In packet-switched networks, packet losses are inevitable, but use of a packet-switched 
network yields a better utilization of channel capacity. However, it should be noted that tire 
varying rate requirements of the video coder may not be synchronized with the vanations in 
available channel capacity which changes depending on the traffic in the network. Therefore, 
die interactions between the coder and the network have to be considered and be incorporated 
into the requirements for the coder. These requirements include. 

1. Adaptability of the coding scheme: The video source we are dealing with has a varying 
information rate. So it is expected that the encoder should generate different bit rates by 
removing fire redundancy. When the video is still, there is no need to transmit anything. 

2. Insensitivity to error: The coding scheme has to be robust to the packet loss so that 
fire quality of the image is never seriously damaged. Remember that retransmission is 
impossible because of the tight timing requirement. 
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3. ^synchronization of the video: Because the varying packet-generating rate and the lack 
of a common clock between the coder and the decoder, we have to find a way to 
reconstruct the received data which is synchronous to the display terminal. 

4. Control of coding rate: Sensing the heavy traffic in the network, the coding scheme is 
required to adjust the coding rate by itself. In the case of a congested network, the 
coder could be switched to another mode which generates fewer bits with a minimal 
degradation of image quality. 

5. Parallel architecture: The coder should preferably be implemented in parallel. That allows 
tlic coding procedure to be run at a lower rate in many parallel streams. 

In tire next section, we investigate a coding scheme to see how well it satisfies tire above 


III. MIXTURE BLOCK CODING WITH PROGRESSIVE 
TRANSMISSION 


Mixture Block Coding (MBC) is a variablc-blocksize transform coding algorithm which 
codes the image with different blocksiz.es depending upon the complexity of that block area. 
Low-Complexity areas are coded with a large blocksize transform coder while high-complcxity 
regions are coded with small blocksize. The complexity of tire specific block is determined by 
the distortion between the coded and original image when the same number of bits arc used to 
code each block. A more complex image block has higher distortion. The advantage ol using 
MBC is that it does not process different complex regions with tire same blocksize. That means 
MBC Iras the ability to choose a finer or coarser coding scheme to deal with different complex 
parts of lire same image. With the same rate, MBC is able to provide an image of higher quality 
than a coding scheme" which codes different complex regions with the same blocksize coder. 


When using MBC, tire image is divided into maximum blocksize blocks. After coding, the 
distortion between the reconstructed and original block is calculated. The block being processed 
is subdivided into smaller blocks if that distortion fails to meet the predetermined threshold. The 
coding-testing procedure continues until the distortion is small enough or the smallest blocksize 
is reached. In this scheme, every block is coded until the reconstructed image is satisfactory 
and then moves to die next block. 


Mixture Block Coding with progressive transmission (MBCPT) is a coding scheme which 
combines MBC and progressive coding. Progressive coding is an approach that allows an initial 
image to be transmitted at a lower bit rate which can later be updated[9]. In this way. successive 
approximations converge to the target image with the first approximation carrying die "most" 
information and the following approximations enhancing it. The process is like focusing a lens, 
where the entire image is transformed from low-quality into high-quality. In progressive coding, 
every pixel value, or the information contained in it. is possibly coded more than once and the 
total bit rate may increase due to different coding scheme and quality desired. Because only the 
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gross features of an image arc being coded and transmitted in the firs, pass d '^ed without 
is greatly reduced for die first pass and a coarse version of die image can be disp y _ 
significant delay. It has been shown that it is perceptually useful to get a crude image in 
time, rather titan waiting a long time to get a clear complete image. 

image to suffer from quality degradation rather than total loss of parts of die images. 

MBCPT is a multipass scheme in which each pass deals with different blockst/ 
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coefficients. The initial threshold of each pass is selected beforehand and is readjustable during 
the operation according to the channel condition and quality required. 

Because only partial blocks which fail to meet the distortion threshold need to be coded, 
side information is needed to instruct the receiver on how to reconstruct die image. One bit 
of overhead is needed for each block. If a block is to be divided, a 1 is assigned to be its 
overhead; if not, a 0 is assigned. The example shown in Fig. 8 has the following overhead; 
1,1001,1001,1001,1001,1001. 


The interframe coder used in this paper is a differential scheme which is based on MB CPF 
This coder processes the difference image coming from tire current frame and the previous 
frame which is locally decoded from the first three pass data. Fig. 9 shows tire algorithm 
of this coder. Fig. 10 shows a different scheme which docs the local decoding with all four 
passes. From Fig. 11, it can be seen that when there is no packet loss, the performances of 
these two schemes arc quite die same. But when congestion occurs in the network, with ihe 
priorities assigned to packets, packets from pass 4 are expected to be discarded first. In this 
case, the performance' (from Fig. 12) of the scheme in Fig. 9 is much better than the one in 
Fig. 10. Therefore the coding scheme in Fig. 9 is used in our simulation. In this paper, the 
Kronkite motion sequence from the USC database with 16 frames is used as the simulation 
source. Every image is 256x256 pixels with graylcvcls ranging from 0 to 255. It is similar to a 
video conferencing type image which has neither rapid motion nor scenes changes. Due to tins 
characteristic, advanced techniques like motion detection or motion compensation have net been 
used but could be implemented when broadcasting video. 

From the datastream output that is listed in Table 1, we can see that tire data in pass 4 
represents 30-40% of die entire data. This part of die data is involved in increasing the sharpness 
of the image and is usually labeled widi the lowest priority in network. Wc dierefore call this die 
least significant pass(LSP). With a substantial possibility of being discarded due to low priority, 
those packets from pass 4 won’t be used to reconstruct die locally decoded image and be stored 
in die frame memory. This prevents die packet loss error propagating into following frames if 
die lost packet belongs to pass 4. 


IV. SIMULATION NETWORK 

The network simulator used for this study was a modified version of an existing simulator 
developed by Nelson et al.[13]. A brief description of the simulator is provided here. 


A. Introduction 

As mentioned in section 2, tomorrow’s integrated telecommunication network is a very 
complicated and dynamic structure. Its efficiency requires sophisticated monitoring and control 
algorithms with communication between nodes reflecting the existing capacity and reliability of 
system components. The scheme for communicating information regarding the operating status 
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is called the system protocol. Since the communication of system information must flow through 
tire channel, it reduces the overall capacity of the physical layers, but hopefully provides a more 
efficient system overall. Therefore, system efficiency depends entirely upon these protocols, 
which, in turn, depend upon tire system topology, communication channel properties, nodal 
memory and component reliability. Most network protocols have been developed to provide high 
reliability in topological structures with reasonably high channel reliability. 

In order to fit into the purpose of this study, most modifications which were made to the 
simulator were in those modules concerning die network layer. Since the simulator is structure 
in modules which represent, to some degree, the ISO Model for packet switched networks, a 
more detailed description about the network layer modules follows. 


B. The Network Layer and Basic Operation 

The simulation of a layer at each node is represented by a "processor” and one or more 
“packet queues." All events arc scheduled through tire "Sim_Q” which drives the simulator 
Initially, the processors are all idle, the packet queues are all empty and the only tasks scheduled 
arc the arrival of messages at the various nodes. The simulator operation occurs by examining 
the next event and performing tire task indicated. The task may result in the scheduling ot 
additional events, generally referred to as task completion times. When a message or packet is 
placed in tire input queue at a node for a given layer, tire processor for that queue is markc 
as busy, the packet is removed from the queue, and the task to be performed by lire processor 
is scheduled for completion. When lire task is completed (as a result of the simulator reaching 
Unit point in time), tire “processor” examines tire queue. If the queue is empty, the processor 
is set idle; otherwise it removes die next message or packet from Ore queue and schedules tire 
completion of tire operation which must be performed, lire layers in the simulator arc quite 
close in operation to tire ISO transport, network and datalink layers. 


(1) The Session Layer 

In die OSI model, the session layer (SL) allows users to establish “sessions" on local or 
remote systems. In tire simulator, as mentioned above, it contains a relatively simple model 
of tire subscribers, participates in flow-control, and acts as a statistics collector for messages 
arriving and delivered. At message arrival time (front Sim_Q), the session layer generates re 
“message” with all of its randomly selected attributes and if flow control or node hold-down 
are not°in effect, submits it to the transport layer. It then schedules tire next message arrival 
time. During initialization, the task “SL_Rcv_Msg” for each node is queued in Sun_Q for the 
arrival time of the first message at that node. When this task is executed by the s.mulator, 
message packet is generated and placed in the transport queue. The amval of the next message 
is then queued in Sim_Q with the same task and with an arrival time determined by the rando 
number generator (Poisson Distributed). The only other task performed by the session layer .s 
tire “SL_Snd_Msg” task that simulates delivery of messages to the subscribers, develops messag 
statistics and “cleans up” the queues for messages delivered. 
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(2) The Transport Layer 

The basic function of the transport layer at the sending end is to receive the message from the 
session layer, place it in packets and pass the packets on to the network layer. At the receiving 
end, lire packets arc reassembled into a message for delivery to the session layer. To accomplish 
the complex task of assuring reliable delivery, there is a transport time-out mechanism at both 
tire sending and receiving nodes and a message acknowledgement packet that is sent to Ore 
sending node when all packets for lire message have been satisfactorily received. At tire sending 
end, if a message acknowledgment is not received in the allotted time period, tire message can 
be retransmitted. In the simulations reported in this paper, the retransmission feature was not 
used. At lire receiving end, if all packets are not received in tire specified period of time, the 
entire message is discarded. It is recognized that in some networks, packetization takes place 
at the network level, leaving the transport layer responsible only for message-level structures. 
Reassembly, depending upon Ore protocol, can take place as low as the datalink level. These 
tasks were both placed in the transport layer, but arc modular, and could be extracted and 
placed elsewhere. Also, lire simulator was originally designed for datagram service, and since 
die packets do not necessarily arrive in order, it is unlikely that assembly would take place 

at tire datalink level. 


(3) The Network Layer 

The network layer is concerned with controlling the operation of tire network. A key design 
issue is determining how packets are routed from source to destination. Another issue is how to 
avoid the congestion caused when too many packets are presented to the network at the same 
time. In tire simulator, die network layer performs all of dre functions related to these two 
aspects with the exception of that aspect of flow control which takes place at dre session layer, 
and lire recovery protocols which require some service from the datalink layer. It also activates 
new channels when needed and determines when packets originating at other nodes arc to be 
discarded. The network layer is currently die most dynamic widr regard to dre coding of modules. 
Five modules currently comprise the network layer. These include relatively static modules; one 
module for capturing lines or channels when more capacity is required and releasing them when 
drey are not needed; one module for the network processor and queue handling and one module 
for dre routines which arc common to most routing algorithms. This leaves two modules for the 
dynamic parts of the rouung and flow control algorithms. 


(4) The Datalink Layer 

The main task of the datalink layer is to take a raw transmission facility and transform it 
into a line or channel that appears free of transmission errors to the network layer. It simulates 
the sending of the message over the channel and the delivery at the other end. When a packet is 
received, the datalink acknowledgement is initiated either by the piggyback acknowledgement 
or by generating a datalink acknowledgement packet. As mentioned previously, the datalink 
level also simulates the physical layer on a statistical basis. (Entered bit error rates are used in 
conjunction with a random number generator to determine if messages are corrupted.) When 
a line is "brought up", health packets are used to establish initial connections. Also, when a 
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line "goes down", an active node will immediately issue health check packets to ascertain when 
the channel is again available. 


C. Modifications 


A major problem of using this system as a simulation tool for the study of packet video 
is that as initially designed the system did not actually transmit messages from node to node. 
While a “packet” carrying all die necessary describing information moved from node to node, 
there no actual data in the packet. Therefore, modifications had to be made to the simulator 
to accommodate the video data. In the sending node, a field called "Image” which contains 
real image data is attached to the record "Packct_Ptr" allocated to die message generated in 
die session layer. There arc dircc new modules in this layer. First, "Gctjmage puts die 
image data into the image field of a message generated at a specific time and node. Second 
"Image Available" checks to sec if there is any image data that still needs to be transmitted, 
that is true, the following message, generated at dial specific node, is still the image message and 
contains some image data. Third, "Receivejmagc" collects the image data in the session layer 
of the receiving node when the flag "Iniage_Completo" is on. In module "Scssion_Msg_Arrivc", 
different priorities arc assigned to different messages. In module "Scssion_Msg_ Send , some 
statistics are calculated including die number of lost image packets and die transmission uclay 
for image packets. 

In d-.e original deisgn, die transport layer simply duplicated the same packet with dilicrcnt 
assigned secucntial packet numbers without actually packctizing die message. The module 
"Transport_Packetizc" has been modified to really packctizc die image data which resides in the 
message record queued in "Transport.Q" when it is called. The module "TransportJRcassemblc ' 
is called to reassemble diese image packets according to their packet number when the flag 
"linage Content” defined in "Packet_Ptr" is true. The network layer is responsible for routing 
and flow-control. This module was already very well developed, so the modifications to be 
performed here were relatively minor. In die datalink layer, in order to simulate the delivery- of 
packets dirough the channel, a new packet is generated at die receiving node and the information 
including the image data from the transmitted packet (which will still be resident at the sending 
node) are copied into it. Using existing bit-error-rates, the transmission success rate can be set 
and bit errors can be inserted in both the data and control bits in the packet. Errors in the control 
bits are simulated separately as long as the eiror rates are consistent. If an error in control bits 
occurs, the transmission is assumed to fail and retransmission will occur, again depending on 
die threshold of the dmeout number. In addition to the modifications made to the layer modules, 
we had to arrange some new memory elements allocated for image messages and packets In 
order to make sure the simulation is run in the steady state, the image data is made available 
to die network after some simulation time has passed. 
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V. INTERACTION OF THE CODER AND THE NETWORK 

When the video data is packed and sent into a nonideal network, some problems emerge. 
These arc discussed in the following section. 


A. Packetization 

Tire task of the packetizcr is to assemble video information, coding mode information, 
if it exists, and synchronization information into transmission cells. In order to prevent the 
propagation of tire error resulting from the packet loss, packets are made independent of e^cr 
oilier and no data from the same block or same frame is separated into different packets. The 
segmentation process in Ore transport layer has no information regarding the video format, o 
avoid tire bit stream being cut randomly, tire packetization process has to be integrated with die 
encoder, which is in the presentation layer of the user’s premise. Otherwise, some overhead has 
to be added into the datastream to guide die transport layer to perform tire packetization in the 
desired manner. In order to limit die delay of packetization, it is necessary to sluft the last cell 
of a packet video with dummy bits if die cell is not completely full. 

Evcrv packet must contain an absolute address which indicates the location of die first block 
it carries." Because every block in MBCPT has the same number of bits in each pass, there is no 
need to indicate the relative address of die following blocks contained in the same packet 1 here 
a! wavs exists a tradeoff between packaging efficiency and error resilience. It error resilience 
is confide r-hlc, one packet should contain a smaller number of blocks. However, since each 
channel access by a station contains overhead, the packet length should be large for transmission 
efficiency. Fixed length packetization is used in this paper for simplicity. 

Because of the structure of the coding scheme, die packets are classified into four pnontms, 
with die packets from the first pass classified as the highest priority packets, and the packets 
from the fourth pass as the lowest priority packets. 

Thi s priority assignment also reflects the importance of the various packets to die 
reconstruction of die image sequence at the receiver. Table 1 shows die effect of approximately 
die same amount of packets lost in each pass on die reconstructed error m the received sequence. 


B. Error Recovery 

There is no way to guarantee that packets won’t get lost after being sent into the network. 
Packet loss can be mainly attributed to two problems. First, bit errors can occur in the address 
field, leading the packets astray in the network. Second, congestion can excee e ne * 
management ability and packets are forced to be discarded due to buffer overflow Effect 
created by higher pass packet loss (like pass 4) in MBCPT coding will be masked by die basic 
passes and replaced widi zeros. The distortion is almost invisible when viewing at video rates 
because the lost area is scattered spatially and over time. However, low pass packets loss (hk 
pass 1). though rare due to high priority, will create an erasure effect due to packeuzauon and 

die effect is very objectionable. 
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Considering die tight time constraint, retransmission is not feasible m packet video. It 
may also result in more severe congestion. Thus, error recovery has to be performed by the 
decoder alone. In our differential MBCPT scheme, tire packets from pass 4 arc labeled lowest 
priority and form a great part of the complete data. These packets can be discarded whenever 
network congestion occurs. That will reduce the network congestion and won’t cause too much 
degradation in quality. The erasures caused by basic pass loss arc simply covered with the 
reconstructed values from the corresponding area in the previous frame. This remedy seems 
insufficient even when there is only small amount of motion in that area. Motion detection 
and motion compensation could be used to find a best matched area for replacement m the 
previous frame. 


Side information in the MBCPT decoding scheme is very important So, this vital information 
is not allowed to get lost. Two methods can be used for protection. First, error control coding, 
like block codes or convolutional codes, can be applied in both directions along with and 

. . - i i . r .1 .1 


perpendicular to the packetization, 


The former is for bit error in tire data field while the latter 


is for packet loss. The minimum distance that tire error control coding should provide depends 
on the network's probability of packet loss, correlation of such loss and channel bit error rate. 
Second, from Table 2, we can see that die output rate of side information and pass 1 and 
even pass 2 is quite steady. It seems feasible to reserve a certain amount of channel capacity 
to these outputs to ensure their timely arrival. That means circuit-switching can be used for 
important and steady data. 


C. Flow Control 


In order to shield the viewer from severe network congestion, there arc some How control 
schemes which are considered useful. If there is an interaction between the encoder and the 
transport laver, then the encoder can be informed about tire network condition. Depending on 
that, the encoder can adjust its coding scheme. In die MBCPT coding scheme, if the buffer 
is getting full, dial means drat dre bit generating rate is overwhelming the packetization rate 
and dre encoder will switch to a coarse quantizer widr fewer steps or loosens die threshold to 
decrease its output rate. In this way, smooth quality degradation is obtainable. However, this 
also complicates the encoder design. 


It is possible to use die congestion control of the network protocols to prevent the drastic 
quality change by assigning different priorities to packets from different passes. Without 
identifying the importance of 'each packet and discarding packets blindly sometimes bangs 
disaster and can cause a session shut down. For example, if the side information gets lost 
it can have a severe impact on the decoding process. In the MBCPT coding scheme, side 
information and packets from pass 1 are assigned highest priority and higher pass packets are 
assigned with decreasing priority. 
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D. Interaction with Protocols 

In the ISO model physical, datalink and network layers comprise the lower layers which 
J a The higher ^ scss^, = io^app, = 
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VI. PERFORMANCE RESULTS 
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can be accommodated in a variety of systems such as a token ring network or a circuit switched 
network with a packet-switched overlay. 

Fig. 15 shows the PSNR for each frame in tire sequence. Notice that the standard deviation 
of tire PSNR is only 0.2 dB, which implies a substantial uniformity of quality, at least in terms 
of objective performance measures. If constancy with regards to some subjective critenon is 
desired, it would be necessary to incorporate this in tire detennination of the thresholds and 
the decision mechanism for the quad tree. In the simulation, the same threshold has been used 
throughout tire sequence. If further flexibility, say for higher visual quality is desired, a varying 
threshold can be used for different frames. That may generate a more variable bit rate. 


From tire difference images of this sequence, frames 1-8 seem quite motionless while frames 
9-13 contain substantial motion. We adjusted tire traffic condition of the network to force some 
of the packets to get lost and thus check tire robustness of tire coding scheme. Heavy traffic 
was set up in the motionless and motion period separately. The average packet loss percentage 
was 3.3% which is considered high for most networks. Fig. 16 shows images which suffered 
packet losses from pass 4. As can be seen, the effect of lost packets is not at all severe, even if 
the lost packet rate is unrealistically high. This is because the performance from tire first three 
passes is relatively good and the packet from tire fourth pass is not essential for reconstruction. 
Fig. 17 shows tire case when packet loss occurs in pass 1. Clearly there are visible defects m 
tire motion period. What’s worse, the error will propagate to tire following frames. Apparently, 
flic replenishing scheme used here is not sufficient in areas with motion. It is believed that 
this inconsistency can be eliminated with a motion compensator algorithm which would find the 
appropriate area for replenishment and error concealment which limits tire propagation of ciror. 


VII. CONCLUSIONS 

The network simulator was used only as a channel in this simulation. In fact, before the real- 
time processor is built, a lot of statistics can be collected from tire network simulator to improve 
upon the coding scheme. These include transmission delays and losses from various passes under 
different network loads. For ^synchronization, the delay jitter between received packets can also 
be estimated from the simulation. The environment for tomorrow’s telecommunication has been 
described and requires a flexibility which is not possible in a circuit-switched network. With all 
tire requirements for applying packet video in mind, MBCPT has been investigated. It is found 
that MBCPT has appealing properties, like high compression rate with good visual performance 
robustness to packet lost, tractable integration with network mechanics and simplicity in parallel 
implementation. Some additional considerations have been proposed for the entire packet video 
system, like designing protocols, packetization, error recovery and ^synchronization. For fast 
moving scenes, the differential MBCPT scheme seems insufficient. Motion compensation, error 
concealment or even attaching function commands into the coding scheme are believed to be 
useful tools to improve the perfomance and will be the direction of future research. 
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Figure 10 Differential MBCPT coding scheme (2). 
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Abstract 

A method for efficiently ceding natural images using a vector- 
quantized variable -block si zed transform source coder is presented. The 
method, Mixture Block Ceding (MHC), incorporates variable-rale coding 
by using a mixture of discrete cosine transform (DC'l) source coders. 
Which coders are selected to code any given linage region is made through 
a threshold driven distort it n criterion. In this paper, MBC is used in two 
different applications Th* base method is concerned with single-pass low- 
rate image data <-..,m;;, Ic .-slor.. The second is a natural extension of the 
base methed which ah 1 ' '.os for low-rate progressive transmission (P I"). 
Since the b,_ r e method adapts easily to progressive ceding, st offers the 
aesthetic advantages of progressive coding without incorporating excessive 
channel overhead. Image compression rates of approximately 0.5 bit/ txd 
are demonstrate 1 for :>-:i monochrome and color images. 

1 Introduction 

Natural imarv-s contain regions of high and low detail. I he 
regions of high dcm.il are more difficult to code than those of low detail. 
Since the difference between number of bits required to code these tao 
types of regions with acceptable distortion levels can be quite large, it is 
desirable to me a variable- rate source coding method. One way to do this 
is bv a. iramfor.m c>:e; which has more than urm blocksize. Regions 

cd hi eh dt tab are rodvb w;l:i smaller blocks, while regions of low detail are 
coded w it h lariu/r hbv.ks. If a similar number of bits are used to code each 
of differ'.'; . t block sites, variable-rate coding is achiever!. A ben trying 
to maximize the cualit v uf the reconstructed image, better usage of coding 
bits ca:i be attained with the use of vector quantization. The goal is to de- 
scribe a method ceding images using both vector quantization and 
multiple-block: i::o transform coding. A block- threshold technique is used 
to select t he blocssize usml to ctxie any par ticu 1 ar region Oi the image. 

Hoth vector quantization and transform coding are block coding 
methods. Block coding methods arc very useful when designing low-rate 
irnaue compression <s stems, and are used almost exclusively when coding 
at low rales (Cbil/pel). Traditional methods presented in the literature 
use cither of these techniques, but have rarely use both together until very 
recently. One of the earlier publication to do this appeared in 1984 [1]. 

When using traditional vector quantizers for coding images, small 
block sizes are used to limit the size of the required quantizer codeboo*. 
But, when coding natural images with a very small number of bits it is 
desirable to use as large a blocksize as possible to take maximum 
advantage of the high inter-pixel correlations. In general, this blocksize is 
usually larger than vector quantization techniques can comfortably handle. 
One method to overcome this problem incorporates subsampleci sector 
quantization [2], but little has been done to use traditional vector quanti- 
zation with variable-rate coding. 

Small codebooks are needed because the best performing vector 
quantization techniques (i.c., the LBG method [3]) are clustering techni- 
ques whose cod ebooks are very unstructured. As a result, the codebooks 
arc difficult to construct, and use. If a codebook is designed to be less 
computationally intensive, such as with lattice quantizers [4], or pyramid 
vector quantizers [51, the attainable distortion per codeword increases for a 
given coding rate. 


Transform coding techniques easily allow for the use of large block- 
sizes so high data compression ratios can be attained. But. it is difficult 
to keep good high-detail resolution when using large transform blocksizes 
[6]. This is true since most methods code only the low-frequency high- 
energy transform coefficients [7j. As a result., the high-frequency coef- 
ficients are often ignored. Since these coefficients carrying most of the 
information about the image's finer detail image quality suffers. Even 
when the image coefficients are vector quantized so more coefficients can 
be coded for a given average rate, it can still be difficult to get good liigh- 
delail resolution. 

By using more than one blocksize, some of the inherent problems 
associated with low-rate transform codim: can Im overcome. Especially 
when vector quantizers are used to code the transform coefficients. The 
vector quantizer codebooks used here are ol 1 <w dimensionality to keep 
their implementation simple. This is done by limiting the number of 
coefficients coded within any given block. For the examples given below, 
the vector quantizers code at most three coefficients as a vector for 
monochrome images, and nine coefficients (thr«c transform {>els) for color 
images. Never more than four transform pels are coded within any block, 
no matter is size. 

A fie* a block is coded using these o '.fficient-limited transform 
codt.;< the distortion is measured, to see if it m« cts a predetermined coding 
threshold . If a block codes poorly, it is divided into four smaller blocks 
and recoded until a distortion threshold is meet or the minimum blocksize 
is attained. Thus, keeping the overall image integrity high by using the 
smaller blocks to more intensely code the iiigh-deta:! regions. 

In action 2 the threshold driven MIR..' coding algorithm is dis- 
cussed. Also, the required overhead sent to the receiver to describe the 
final block structure of the coder is presented. Section 3 is a presentation 
of the MBC progressive transmission (MBC/P’I ) modification. In sections 
4 and 5 the transform coder and the vector quantizers used in the example 
are shown. And finally, several examples are presented. 

2 The Threshold Driven Structure of MBC 

As mentioned above, each block of the image is coded using only a 
small number of transform coefficients. The difference between the 
original image and the coded image block is measured, and if the 

difference does not fall below a predetermined threshold, the block is 
divided into four smaller blocks and recoded. A new threshold is then 

applied to sec if any of these blocks need to be divided further. This 

divide-and-it-st algorithm is continued until the entire image is coded with 
distortion that is less than the block threshold levels or the smallest 

blocksize is reached. 

The monochrome images are coded using the maximum absolute 
difference distortion measure. 

d = maxjjx.-yj 

where the range of i is taken over the image block being coded, and jq is 
the coded value of pixel x t . For color images the maximum mean square 

difference is used, ( 

d = max, J(x,-y,) T (x 1 -y,)/3 
where v, is the coded value of color pel x,. 


'This work was supported by the NASA Goddard Space Flight Center 
under grant NAG 5-916. 


912 


PRECEDING PAGE BL ANK NOT FILMED 



To demonstrate the MBC method consider the coding of an 
example image segment. In the following it is assumed the image is coded 
with a starling blocksize of 16x16 and a smallest blocksizc of 2x2. 

First, a 16x16 block is coded, using the DOT method described 
below, and the distortion le'.el for the block is measured. If this distortion 
is greater than the predetermined maximum level for 16x16 blocks, 
dmin( 16x16). the block is divided into four Sxg blocks for additional 
coding. After each of the Sx$ blocks is coded their resulting distortion 
levels are compared with the SxS distortion level. dmin(SxB). This 
process is continued until the only image blocks not meeting their given 
distortion threshold are those of size 2x2. Since 2x2 is the smallest 
allowed blocksize. these blocks are transmitted de facto, making no further 
attempt to improve their distortion level. 

Each 16x10 block can be completely coded, using all blocksizos, 
before moving onto the next 16x16 block or all the 16x16 blocks of the 
entire image can be coded, before moving to the SxS blocks. For MBC, 
this sequence is immaterial. The later method allows one to develop the 
progressive technique introduced in t he next section. 

Consider tine example ceding of a 16x16 image block shown in 
Figure 1. For clarity in the following material, let the four sub-blocks of 
an arbitra: 1 . block be numbered as shown in Figure 2, and let the block 


Consider ihe 
Figure 1. For clarity 
an arbitrary block be 
distortion thresholds 1 


n ' n ( 1 6 x 1 6 ) = 12 . 

and 


?o this block must, be 


5. -h and 6. 


reception of images is nonacsthetic, the need to update the image on a 
frame-by-frame basis (or progressive basis) has arisen. In general, most 
methods found in the literature are concerned with perfect reconstruction 
of the image The image is transmitted on a continually improving basis 
until it is completely transmitted so that 'it can be reconstructed without 
error. This requires the transmission of far more data than is needed to 
attain a visually pleasing reconstruction of the image (as is the case 
considered here).' But. much of this literature is directly applicable to the 
low-rate transmission of images since almost every PT method 
reconstructs a visually acceptable field within a limited number of passes 
[e.g., 12). 

In Lhe previous section a single an example 16x16 block was coded 
bv passing through a!i of the necessary blocksize* before moving on to the 
next 16x16 block. But, ;f the entire image is passed through for each 
blocksize and the difference image is save for additional coding as is 
necessarv in the next pass, the MBC method can be used as a PI coder. If 
each pass is immediately transmitted, the receiver can be reconstructing a 
crude representation of tire image using these larger blocksize coefficients 
while the coder is processing the r.oxt pass . All passes after the first need 
onlv code the Image residuals. The residuals coding information received 
in 'subsequent passes is added to the "already waiting' image of the 
receiver. The image is updated using smaller blocks so it acquires more 
clarity with each pass. Since these blocks are of smaller size, each pass 
updates higher-frequency image components than were coded in the 


Since the first pass 
i'.Uhf.ush a crude in; 


nous prob.cm m wpo. 


is coded w ; ih very few bits the receiver has an 
iage. almost immediatc-ly . Since the successive 
renee irr.ae*' instead of the original there is no 
r r p ,» \ j }\ C H'i C t h O I fo T P T C C<i ing. 

egior.s that code poorly are updated in succes- 
^ns of the image which need additional coding 
to use coder resource*. This means the regions ol the image 
tail are coded quickly, and remain fixed, as the rest of the 
[iues to change as the information for each new pacs is 


information which are connected together 
inter scheme. To guarantee the receiver 
.•? ; v> a hit ) of side informal ;• is sent 
. m information. One bit of side, information 
of yre.iter than 2x2. If a block is to bo 
[f its bit valu>* is set to 0. To tell l lie 
...... od-T bl.<ks, 2 bits of side information 

1.0 juAvUTh 

c:6 blo:k is divided into blocks. The next 


the 2Vj 
the 


? bl'XV. LS d l '• ided. TIlC 

third 4 :* ■ block is divided into 2x2 bl 
; are plari.-d •with the bit maps 


asl four bits, 
■ck^. Notice 
generated at 


times th:i 
where bio 
SxS and ■ 
impoisibl 
This ir.di 
Likewise. 


different *:■ 


The high detail regions of an image are coded more than once 
when using the MBC/PT method. Not only do the high detail regions 
require a greater channel capacity to transmit their coefficients because 
smaller blocksize* are being used but. it they also require channel resources 
in each of the previous prunes. The rate for a MBC/PT coder is calculated 
ndmg (lab but the pass fractions are no larger constrained to add to one. 


•: of no Is and d, is the number of bits used to 
v-. the average coding rate for the blocks of 

h the last pass (n-th) there are no overhead 
craze coding rate lor the entire MBC system is 
- T' . p . r , . and V [P( = 1. (la,b) 

coded with i- : h pass blocksize. 

;o force the coding method of one blocksizc 
is the case for the examples of this paper. Somc- 
i;a! or, in fact, be impossible. Consider the case 
5 are code with, say, six DCT coefficients. The 
coded in a similar fashion but, of course, it is 
he a 2x2 blocks with six transform coefficients. 
1-xVx niust be coded with a different, method, 
ove. there is no reason to keep the distortion 
It ic even possible to change the distortion 
e. And. since the different blocksizes encompass 
ranees it may be desirable to do this. 

regressive Transmission MBC 

n is 5 ion has grown out of need to transmit images 
width is dramatically smaller than what is avail- 
struaion of full-field imagery. Since slow-scan 


N] i p i > 1. and p.^p^, for V**/ 

Tims, applying this To ‘(U) shows that is it possible ^ have 
p ^ _ , r > FU tgr* But this can be cifiet by the fact that MBC /PI nia* 
converge to the ordinal image more quickly and require fewer blocks since 
the busy sections of the image arc coded with information that is taken 
from one than a single pa*?. As is shown in section 5. an MBC/PT image 
can require fewer coding bits to transmit than image of similar quality 
using the MBC method. 

4 The Transform Coder 

If a laree block doe-* not adequately code a given image region, it 
is divided into smaller blocks and recoded. So there is r,o strict advantage 
in using a large number of coefficients to code any particular blocksize. In 
fact, there is a tradeoff between expanding more effort coding the larger 
blocks so fewer smaller blocks are used, and coding the larger blocks 
minimally so to let the threshold algorithm assign mo.e -n.allcr blocxs tor 
coding. 

For the examples of this paper it was chosen to code each block 
with only four DCT transform coefficients, including the dc and three 
lowest order frequency coefficients (Figure 4). This was done, not so much 
to attain the best overall coding rate. but. to strike a median between PT 
coders which code a minimum of information about a given block [8) and 
those which code a large amount of information per block [9]. This accom- 
plishes two things. Firstly, it shows an image can be adequately coded in 
a relatively small number of passes (four for the examples here) using a 
small number of transform coefficients at each pas*, and second!), it shows 
that this can be done using a simple ceding algorithm for each pass. In 
addition, when using the same transform coder for each pass it is also 
possible to use share the same vector quantizer between all of the passes. 
This saves quantizer design effort. 
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5 The Quantizers 

The following four paragraphs describe the quantizers used to code 
the examples discussed in section 5 of this paper. The remaining para- 
graphs discuss the details of LBG quantizers used. 

When the MBC method was used t > code monochrome images, 
the dc transform coefficients were coded with an S-bil linear scalar 
quantizer (LSQ), and the ac transform coefficients were coded as a single 
3-dimensional vector using an LBG vector quantizer whose codcbook was 
of size 25G. 

When the MBC/PT method was used to code monochrome 
images, the quantizers for the first pass were different than those of 
subsequent passes. The these later passes code difference images that have 
nearly zero means blocks, while the first pass deals with the original image 
which is not zero mean. The dc coefficients for the first pass were coded 
with an 8-bit LSQ. In subsequent, passes, the dc coefficients were coded 
with a 5-bit optimal iapUdan scalar quantizer (OLSQ) [lOj- 'l he non-oc 
coefficients were coded the same for all passes using an LBo vector 
quantizer whose codcbook contained 256 vectors. 

The three non-dc coefficients of the YIQ images, when Ubir.g the 
method MBC, were quantized with a vector quantizer -whose cod ebook 
contained 1024 veciurs. The dc Y-componcnts were quantized with an 6- 
bit LSQ and the tic \- and Q- components were coded with a o-bii OLSQ. 


hnis case, the rate is .35$ bits/pel, and the percentage of blocks coded with 
i blocks is 25.25 percent. 

To code the monochrome image with MBC/PT only requires an 
,vua Oil bits/ pc I over MBC. The overhead needed to code any given 
; "mage is a function of the LBG codcbook, and the codcbook is a function 
of the training set used. A differently constructed codcbook could ofTer 
different results. It is interesting to note that the \IQ MBC/PT method 
iirev i r es less coding rate to obtain the same image quality (PSNR) as is 
obtained when using MBC alone. U is clear that MBC/PT does not 
require excessive overhead to add the desirable P f feature. 
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uamizers were chosen to be nonadaplive. This 
modified to more effectively quantize out-of- 
3 e coder moves from image to image [e.g.. H]- 
ta jscns. Sometimes the overhead required to 
implement such a technique car. be overly expensive and the return 
acquired frem it minimal. The extra effort needed stood against the 
design goal to construct a “simple to implement” vector-quantized 
adaptive transform source coder. 

6 Results 

All of the examples, as listed in Tables 3-6, use the 512x512 RGB 
woman/hat picture of the UCLA database. The monochrome examp.es 
use the green (G) color field, while the YIQ images are made using the 
RGB to YIQ transformation matrix of [12]. All the examples u* a 
starting blocksize of 16x10 and a final blocksize of 2x2. These tables list 
the number of blocks coded for each blockslze, and the thresholds used to 
test the quality of the coding passes. Also, the MBC/PT tables list the 
average coding rate that has accumulated after each pass. 

This rate is based upon the average of the coding bits as spread 
across the entire image, without concern for what fraction of the image is 
coded within anv particular pass. These rates represent the coding rate 
that is required ‘to code the image if the coder where to stop with that 
particular pass. Since the remaining passes are yet to be coded, the image 
Percentage coded within the indicated pass must be updates to include the 
image percentages coded in all of subsequent passes. For example, 
consider the MBC/PT rate of Table 4 when stopping at 4x4 blocks. In 


Vector quantizer 


Table 1 . 

scale factors for monochrome i 

Blocksize 

scale factor 

16x16 

60 

8x8 

35 

4x4 

20 

2x2 

10 


Table 2. 

Vector quantizer scale factors for YIQ images. 


Blocksize 

16x16 

8x8 

4x4 

2x2 


scale factor 
80 
35 
20 
10 
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Table 3. 

M on chrome MDC rate (bits/pel) 
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Table 4. 

Monchrome MBC/PT rate (bits/pcl) 
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Figure 1. Example coded 16x16 block 
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Figure 4. Lowest order OCT coefficients 







